From patchwork Fri Feb  7 21:49:16 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Peter Maydell <peter.maydell@linaro.org>
X-Patchwork-Id: 24322
Return-Path: <patchwork-forward+bncBC6Z756YVMIBB5VJ2WLQKGQENUCEVGI@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-yk0-f198.google.com (mail-yk0-f198.google.com
 [209.85.160.198])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 83592202B2
 for <linaro@patches.linaro.org>; Fri,  7 Feb 2014 21:49:43 +0000 (UTC)
Received: by mail-yk0-f198.google.com with SMTP id 131sf1666578ykp.1
 for <linaro@patches.linaro.org>; Fri, 07 Feb 2014 13:49:42 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject
 :date:message-id:in-reply-to:references:x-original-sender
 :x-original-authentication-results:precedence:mailing-list:list-id
 :list-post:list-help:list-archive:list-unsubscribe;
 bh=GVMhTw7/1X3/ByrrHybKQ2kQ0nLdALZT9iuLLelNdY0=;
 b=TS31Z62jGl9sX1Yo33ONcXjgYh65TbD2gkIX6DVSp50y1/zoHKu8YAi/i+asBekpFl
 wdfadDvMgqu4593dr7TFzJkkAmWXRZxiuMOqlpEO9TcXNTJUoXsXTXhqmSFWq2steNXC
 b1OfxDjGO3+bjWsmyWivv2A1/850AH48fa8ijfQsutwsoZlLQHG/bWwgNG+Ja4uWWxRq
 gxOVEuH7jYlPN3ks2DLdCbTLoCrorKl21gQ9Ci3vm3tc1njyW2ZdihFTcSrfhhnhnGb+
 CyxqRCMzCHnAZecjIsKyRh/vZezotGyRRpcrV6jtNKoL2Y6R4o0f91mTwoEuprLc0DU3
 VkEg==
X-Gm-Message-State: ALoCoQllS1AJvdHV0/Dh46REIl/fGfDNfRh/0O/piT8PzKgOmBFKcmr1SFAyst9juc8oNDTD5GDo
X-Received: by 10.58.253.33 with SMTP id zx1mr6492276vec.10.1391809782393;
 Fri, 07 Feb 2014 13:49:42 -0800 (PST)
MIME-Version: 1.0
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.140.20.146 with SMTP id 18ls320824qgj.58.gmail; Fri, 07 Feb
 2014 13:49:42 -0800 (PST)
X-Received: by 10.58.161.78 with SMTP id xq14mr30277veb.51.1391809782251;
 Fri, 07 Feb 2014 13:49:42 -0800 (PST)
Received: from mail-vb0-f42.google.com (mail-vb0-f42.google.com
 [209.85.212.42]) by mx.google.com with ESMTPS id
 us10si1786013vcb.134.2014.02.07.13.49.42
 for <patchwork-forward@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 07 Feb 2014 13:49:42 -0800 (PST)
Received-SPF: neutral (google.com: 209.85.212.42 is neither permitted nor
 denied by best guess record for domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org)
 client-ip=209.85.212.42; 
Received: by mail-vb0-f42.google.com with SMTP id o7so892447vbn.1
 for <patchwork-forward@linaro.org>;
 Fri, 07 Feb 2014 13:49:42 -0800 (PST)
X-Received: by 10.220.109.1 with SMTP id h1mr12295388vcp.20.1391809782137;
 Fri, 07 Feb 2014 13:49:42 -0800 (PST)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patches@linaro.org
Received: by 10.220.174.196 with SMTP id u4csp137813vcz;
 Fri, 7 Feb 2014 13:49:41 -0800 (PST)
X-Received: by 10.204.76.202 with SMTP id d10mr27581bkk.122.1391809780940;
 Fri, 07 Feb 2014 13:49:40 -0800 (PST)
Received: from mnementh.archaic.org.uk (mnementh.archaic.org.uk.
 [2001:8b0:1d0::1]) by mx.google.com with ESMTPS id
 dg6si4952388bkc.66.2014.02.07.13.49.39 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Fri, 07 Feb 2014 13:49:40 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 pm215@archaic.org.uk designates 2001:8b0:1d0::1 as permitted
 sender) client-ip=2001:8b0:1d0::1; 
Received: from pm215 by mnementh.archaic.org.uk with local (Exim 4.80)
 (envelope-from <pm215@archaic.org.uk>)
 id 1WBtIZ-0002w0-Rw; Fri, 07 Feb 2014 21:49:23 +0000
From: Peter Maydell <peter.maydell@linaro.org>
To: qemu-devel@nongnu.org
Cc: patches@linaro.org, Alexander Graf <agraf@suse.de>,
 Michael Matz <matz@suse.de>, Claudio Fontana <claudio.fontana@linaro.org>,
 Dirk Mueller <dmueller@suse.de>,
 Laurent Desnogues <laurent.desnogues@gmail.com>,
 kvmarm@lists.cs.columbia.edu, Richard Henderson <rth@twiddle.net>,
 =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.bennee@linaro.org>,
 Christoffer Dall <christoffer.dall@linaro.org>,
 Will Newton <will.newton@linaro.org>,
 Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Subject: [PATCH 1/8] target-arm: A64: Implement plain vector SIMD indexed
 element insns
Date: Fri,  7 Feb 2014 21:49:16 +0000
Message-Id: <1391809763-11251-2-git-send-email-peter.maydell@linaro.org>
X-Mailer: git-send-email 1.7.10.4
In-Reply-To: <1391809763-11251-1-git-send-email-peter.maydell@linaro.org>
References: <1391809763-11251-1-git-send-email-peter.maydell@linaro.org>
X-Removed-Original-Auth: Dkim didn't pass.
X-Original-Sender: peter.maydell@linaro.org
X-Original-Authentication-Results: mx.google.com;       spf=neutral
 (google.com: 209.85.212.42 is neither permitted nor denied by best
 guess record for domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org
Precedence: list
Mailing-list: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
List-ID: <patchwork-forward.linaro.org>
X-Google-Group-Id: 836684582541
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Unsubscribe: <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>, 
 <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>

Implement all the SIMD vector x indexed element instructions
in the subcategory which are not 'long' ops.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/helper-a64.c    |  26 +++++
 target-arm/helper-a64.h    |   2 +
 target-arm/translate-a64.c | 245 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 272 insertions(+), 1 deletion(-)
diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 6ca958a..fe90a5c 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -123,6 +123,32 @@ uint64_t HELPER(vfp_cmped_a64)(float64 x, float64 y, void *fp_status)
     return float_rel_to_flags(float64_compare(x, y, fp_status));
 }
 
+float32 HELPER(vfp_mulxs)(float32 a, float32 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+
+    if ((float32_is_zero(a) && float32_is_infinity(b)) ||
+        (float32_is_infinity(a) && float32_is_zero(b))) {
+        /* 2.0 with the sign bit set to sign(A) XOR sign(B) */
+        return make_float32((1U << 30) |
+                            ((float32_val(a) ^ float32_val(b)) & (1U << 31)));
+    }
+    return float32_mul(a, b, fpst);
+}
+
+float64 HELPER(vfp_mulxd)(float64 a, float64 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+
+    if ((float64_is_zero(a) && float64_is_infinity(b)) ||
+        (float64_is_infinity(a) && float64_is_zero(b))) {
+        /* 2.0 with the sign bit set to sign(A) XOR sign(B) */
+        return make_float64((1ULL << 62) |
+                            ((float64_val(a) ^ float64_val(b)) & (1ULL << 63)));
+    }
+    return float64_mul(a, b, fpst);
+}
+
 uint64_t HELPER(simd_tbl)(CPUARMState *env, uint64_t result, uint64_t indices,
                           uint32_t rn, uint32_t numregs)
 {
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index 99832ee..84310e8 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -27,3 +27,5 @@ DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
 DEF_HELPER_3(vfp_cmped_a64, i64, f64, f64, ptr)
 DEF_HELPER_FLAGS_5(simd_tbl, TCG_CALL_NO_RWG_SE, i64, env, i64, i64, i32, i32)
+DEF_HELPER_FLAGS_3(vfp_mulxs, TCG_CALL_NO_RWG, f32, f32, f32, ptr)
+DEF_HELPER_FLAGS_3(vfp_mulxd, TCG_CALL_NO_RWG, f64, f64, f64, ptr)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index d60223a..b7f1ecf 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -7813,7 +7813,250 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    /* This encoding has two kinds of instruction:
+     *  normal, where we perform elt x idxelt => elt for each
+     *     element in the vector
+     *  long, where we perform elt x idxelt and generate a result of
+     *     double the width of the input element
+     * The long ops have a 'part' specifier (ie come in INSN, INSN2 pairs).
+     */
+    bool is_q = extract32(insn, 30, 1);
+    bool u = extract32(insn, 29, 1);
+    int size = extract32(insn, 22, 2);
+    int l = extract32(insn, 21, 1);
+    int m = extract32(insn, 20, 1);
+    /* Note that the Rm field here is only 4 bits, not 5 as it usually is */
+    int rm = extract32(insn, 16, 4);
+    int opcode = extract32(insn, 12, 4);
+    int h = extract32(insn, 11, 1);
+    int rn = extract32(insn, 5, 5);
+    int rd = extract32(insn, 0, 5);
+    bool is_long = false;
+    bool is_fp = false;
+    int index;
+    TCGv_ptr fpst;
+
+    switch (opcode) {
+    case 0x0: /* MLA */
+    case 0x4: /* MLS */
+        if (!u) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    case 0x2: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
+    case 0x6: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
+    case 0xa: /* SMULL, SMULL2, UMULL, UMULL2 */
+        is_long = true;
+        break;
+    case 0x3: /* SQDMLAL, SQDMLAL2 */
+    case 0x7: /* SQDMLSL, SQDMLSL2 */
+    case 0xb: /* SQDMULL, SQDMULL2 */
+        is_long = true;
+        /* fall through */
+    case 0xc: /* SQDMULH */
+    case 0xd: /* SQRDMULH */
+    case 0x8: /* MUL */
+        if (u) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    case 0x1: /* FMLA */
+    case 0x5: /* FMLS */
+        if (u) {
+            unallocated_encoding(s);
+            return;
+        }
+        /* fall through */
+    case 0x9: /* FMUL, FMULX */
+        if (!extract32(size, 1, 1)) {
+            unallocated_encoding(s);
+            return;
+        }
+        is_fp = true;
+        break;
+    }
+
+    if (is_fp) {
+        /* low bit of size indicates single/double */
+        size = extract32(size, 0, 1) ? 3 : 2;
+        if (size == 2) {
+            index = h << 1 | l;
+        } else {
+            if (l || !is_q) {
+                unallocated_encoding(s);
+                return;
+            }
+            index = h;
+        }
+        rm |= (m << 4);
+    } else {
+        switch (size) {
+        case 1:
+            index = h << 2 | l << 1 | m;
+            break;
+        case 2:
+            index = h << 1 | l;
+            rm |= (m << 4);
+            break;
+        default:
+            unallocated_encoding(s);
+            return;
+        }
+    }
+
+    if (is_long) {
+        unsupported_encoding(s, insn);
+        return;
+    }
+
+    if (is_fp) {
+        fpst = get_fpstatus_ptr();
+    } else {
+        TCGV_UNUSED_PTR(fpst);
+    }
+
+    if (size == 3) {
+        TCGv_i64 tcg_idx = tcg_temp_new_i64();
+        int pass;
+
+        assert(is_fp && is_q && !is_long);
+
+        read_vec_element(s, tcg_idx, rm, index, MO_64);
+
+        for (pass = 0; pass < 2; pass++) {
+            TCGv_i64 tcg_op = tcg_temp_new_i64();
+            TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+            read_vec_element(s, tcg_op, rn, pass, MO_64);
+
+            switch (opcode) {
+            case 0x5: /* FMLS */
+                /* As usual for ARM, separate negation for fused multiply-add */
+                gen_helper_vfp_negd(tcg_op, tcg_op);
+                /* fall through */
+            case 0x1: /* FMLA */
+                read_vec_element(s, tcg_res, rd, pass, MO_64);
+                gen_helper_vfp_muladdd(tcg_res, tcg_op, tcg_idx, tcg_res, fpst);
+                break;
+            case 0x9: /* FMUL, FMULX */
+                if (u) {
+                    gen_helper_vfp_mulxd(tcg_res, tcg_op, tcg_idx, fpst);
+                } else {
+                    gen_helper_vfp_muld(tcg_res, tcg_op, tcg_idx, fpst);
+                }
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            write_vec_element(s, tcg_res, rd, pass, MO_64);
+            tcg_temp_free_i64(tcg_op);
+            tcg_temp_free_i64(tcg_res);
+        }
+
+        tcg_temp_free_i64(tcg_idx);
+    } else if (!is_long) {
+        /* 32 bit floating point, or 16 or 32 bit integer */
+        TCGv_i32 tcg_idx = tcg_temp_new_i32();
+        int pass;
+
+        read_vec_element_i32(s, tcg_idx, rm, index, size);
+
+        if (size == 1) {
+            /* The simplest way to handle the 16x16 indexed ops is to duplicate
+             * the index into both halves of the 32 bit tcg_idx and then use
+             * the usual Neon helpers.
+             */
+            tcg_gen_deposit_i32(tcg_idx, tcg_idx, tcg_idx, 16, 16);
+        }
+
+        for (pass = 0; pass < (is_q ? 4 : 2); pass++) {
+            TCGv_i32 tcg_op = tcg_temp_new_i32();
+            TCGv_i32 tcg_res = tcg_temp_new_i32();
+
+            read_vec_element_i32(s, tcg_op, rn, pass, MO_32);
+
+            switch (opcode) {
+            case 0x0: /* MLA */
+            case 0x4: /* MLS */
+            case 0x8: /* MUL */
+            {
+                static NeonGenTwoOpFn * const fns[2][2] = {
+                    { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
+                    { tcg_gen_add_i32, tcg_gen_sub_i32 },
+                };
+                NeonGenTwoOpFn *genfn;
+                bool is_sub = opcode == 0x4;
+
+                if (size == 1) {
+                    gen_helper_neon_mul_u16(tcg_res, tcg_op, tcg_idx);
+                } else {
+                    tcg_gen_mul_i32(tcg_res, tcg_op, tcg_idx);
+                }
+                if (opcode == 0x8) {
+                    break;
+                }
+                read_vec_element_i32(s, tcg_op, rd, pass, MO_32);
+                genfn = fns[size - 1][is_sub];
+                genfn(tcg_res, tcg_op, tcg_res);
+                break;
+            }
+            case 0x5: /* FMLS */
+                /* As usual for ARM, separate negation for fused multiply-add */
+                gen_helper_vfp_negs(tcg_op, tcg_op);
+                /* fall through */
+            case 0x1: /* FMLA */
+                read_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+                gen_helper_vfp_muladds(tcg_res, tcg_op, tcg_idx, tcg_res, fpst);
+                break;
+            case 0x9: /* FMUL, FMULX */
+                if (u) {
+                    gen_helper_vfp_mulxs(tcg_res, tcg_op, tcg_idx, fpst);
+                } else {
+                    gen_helper_vfp_muls(tcg_res, tcg_op, tcg_idx, fpst);
+                }
+                break;
+            case 0xc: /* SQDMULH */
+                if (size == 1) {
+                    gen_helper_neon_qdmulh_s16(tcg_res, cpu_env,
+                                               tcg_op, tcg_idx);
+                } else {
+                    gen_helper_neon_qdmulh_s32(tcg_res, cpu_env,
+                                               tcg_op, tcg_idx);
+                }
+                break;
+            case 0xd: /* SQRDMULH */
+                if (size == 1) {
+                    gen_helper_neon_qrdmulh_s16(tcg_res, cpu_env,
+                                                tcg_op, tcg_idx);
+                } else {
+                    gen_helper_neon_qrdmulh_s32(tcg_res, cpu_env,
+                                                tcg_op, tcg_idx);
+                }
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+            tcg_temp_free_i32(tcg_op);
+            tcg_temp_free_i32(tcg_res);
+        }
+
+        tcg_temp_free_i32(tcg_idx);
+
+        if (!is_q) {
+            clear_vec_high(s, rd);
+        }
+    } else {
+        /* long ops: 16x16->32 or 32x32->64 */
+    }
+
+    if (!TCGV_IS_UNUSED_PTR(fpst)) {
+        tcg_temp_free_ptr(fpst);
+    }
 }
 
 /* C3.6.19 Crypto AES