From patchwork Wed Jan 29 13:39:28 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Peter Maydell <peter.maydell@linaro.org>
X-Patchwork-Id: 23873
Return-Path: <patchwork-forward+bncBC6Z756YVMIBBKE6USLQKGQE3DQCAVQ@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-vb0-f70.google.com (mail-vb0-f70.google.com
 [209.85.212.70])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id E35B1202FA
 for <linaro@patches.linaro.org>; Wed, 29 Jan 2014 14:24:41 +0000 (UTC)
Received: by mail-vb0-f70.google.com with SMTP id w17sf4007296vbj.1
 for <linaro@patches.linaro.org>; Wed, 29 Jan 2014 06:24:41 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:delivered-to:from:to:date:message-id:in-reply-to
 :references:mime-version:cc:subject:precedence:list-id
 :list-unsubscribe:list-archive:list-post:list-help:list-subscribe
 :errors-to:sender:x-original-sender
 :x-original-authentication-results:mailing-list:content-type
 :content-transfer-encoding;
 bh=PpONzegcoPu9ak7zg9QreV/ow6a1Ho6tPY2vEpHlod8=;
 b=drYk70WWRKn03/QIdDI+QqKuARCd69oNBHt/8O66k61STf0bNQ0DBvu+1RjM/DknOF
 Cu/FlFQ0NYP+0hyyqW7zM0aLTHFwfQA4JeikC/f5p407NaUPVE1Elcd89UtFSDEszgAt
 tTet5R9kb10CROeYheflohaHjqDQUpOe248nxshMGaxQ/L4e3Ow1RJ3qtDkZsEf58/zM
 6MbwpuE52tEkXocnlfSzPflnOxY5zosMTZJz9Arv2H5sbpKY+ETjbo8eAl97N9GWPHps
 qHuoTGtCWuN/rHdCrfKOWEh7lVhOgdHyPgZLjDgRCwkVQh/d+q7cRS3+xy+tt8CteSlo
 dUIw==
X-Gm-Message-State: ALoCoQlEYCiVxC7XJ5K2xc0IW1UPDon+gTkQmuAuRMZOsrgjLBXXCZ30JeDfB9iwQPfK2boAUMXQ
X-Received: by 10.236.84.239 with SMTP id s75mr2781580yhe.28.1391005480999; 
 Wed, 29 Jan 2014 06:24:40 -0800 (PST)
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.140.21.136 with SMTP id 8ls110526qgl.27.gmail; Wed, 29 Jan
 2014 06:24:40 -0800 (PST)
X-Received: by 10.52.107.35 with SMTP id gz3mr5775790vdb.8.1391005480820;
 Wed, 29 Jan 2014 06:24:40 -0800 (PST)
Received: from mail-vb0-f42.google.com (mail-vb0-f42.google.com
 [209.85.212.42])
 by mx.google.com with ESMTPS id sq4si831927vdc.93.2014.01.29.06.24.40
 for <patchwork-forward@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 29 Jan 2014 06:24:40 -0800 (PST)
Received-SPF: neutral (google.com: 209.85.212.42 is neither permitted nor
 denied by best guess record for domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org)
 client-ip=209.85.212.42; 
Received: by mail-vb0-f42.google.com with SMTP id i3so1222408vbh.1
 for <patchwork-forward@linaro.org>;
 Wed, 29 Jan 2014 06:24:40 -0800 (PST)
X-Received: by 10.220.139.136 with SMTP id e8mr82719vcu.34.1391005480721;
 Wed, 29 Jan 2014 06:24:40 -0800 (PST)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.220.174.196 with SMTP id u4csp116693vcz;
 Wed, 29 Jan 2014 06:24:40 -0800 (PST)
X-Received: by 10.224.151.147 with SMTP id c19mr12329354qaw.86.1391005480093; 
 Wed, 29 Jan 2014 06:24:40 -0800 (PST)
Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11])
 by mx.google.com with ESMTPS id
 21si1843946qga.161.2014.01.29.06.24.39 for <patch@linaro.org>
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Wed, 29 Jan 2014 06:24:40 -0800 (PST)
Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 2001:4830:134:3::11 as permitted sender)
 client-ip=2001:4830:134:3::11; 
Received: from localhost ([::1]:42384 helo=lists.gnu.org)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <qemu-devel-bounces+patch=linaro.org@nongnu.org>)
 id 1W8VVi-0004cL-1F
 for patch@linaro.org; Wed, 29 Jan 2014 08:48:58 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49124)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <pm215@archaic.org.uk>) id 1W8VNk-0001JA-Sp
 for qemu-devel@nongnu.org; Wed, 29 Jan 2014 08:40:46 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <pm215@archaic.org.uk>) id 1W8VNi-0001LK-EN
 for qemu-devel@nongnu.org; Wed, 29 Jan 2014 08:40:44 -0500
Received: from mnementh.archaic.org.uk ([2001:8b0:1d0::1]:45250)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <pm215@archaic.org.uk>) id 1W8VNh-0001I9-PV
 for qemu-devel@nongnu.org; Wed, 29 Jan 2014 08:40:42 -0500
Received: from pm215 by mnementh.archaic.org.uk with local (Exim 4.80)
 (envelope-from <pm215@archaic.org.uk>)
 id 1W8VN7-0006vV-9Q; Wed, 29 Jan 2014 13:40:05 +0000
From: Peter Maydell <peter.maydell@linaro.org>
To: Anthony Liguori <aliguori@amazon.com>
Date: Wed, 29 Jan 2014 13:39:28 +0000
Message-Id: <1391002805-26596-2-git-send-email-peter.maydell@linaro.org>
X-Mailer: git-send-email 1.7.10.4
In-Reply-To: <1391002805-26596-1-git-send-email-peter.maydell@linaro.org>
References: <1391002805-26596-1-git-send-email-peter.maydell@linaro.org>
MIME-Version: 1.0
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:8b0:1d0::1
Cc: Blue Swirl <blauwirbel@gmail.com>, qemu-devel@nongnu.org,
 Aurelien Jarno <aurelien@aurel32.net>
Subject: [Qemu-devel] [PULL 01/38] target-arm: A64: Add SIMD ld/st multiple
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <patchwork-forward.linaro.org>
List-Unsubscribe: <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>, 
 <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
X-Removed-Original-Auth: Dkim didn't pass.
X-Original-Sender: peter.maydell@linaro.org
X-Original-Authentication-Results: mx.google.com;       spf=neutral
 (google.com: 209.85.212.42 is neither permitted nor denied by best
 guess record for domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org
Mailing-list: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
X-Google-Group-Id: 836684582541

From: Alex Bennée <alex.bennee@linaro.org>

This adds support support for the SIMD load/store
multiple category of instructions.

This also brings in a couple of helper functions for manipulating
sections of the SIMD registers:

  * do_vec_get - fetch value from a slice of a vector register
  * do_vec_set - set a slice of a vector register

which use vec_reg_offset for consistent processing of offsets in an
endian aware manner. There are also additional helpers:

  * do_vec_ld - load value into SIMD
  * do_vec_st - store value from SIMD

which load or store a slice of a vector register to memory.
These don't zero extend like the fp variants.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 250 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 248 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index cf80c46..e4fdf00 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -308,6 +308,28 @@ static TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
     return v;
 }
 
+/* Return the offset into CPUARMState of an element of specified
+ * size, 'element' places in from the least significant end of
+ * the FP/vector register Qn.
+ */
+static inline int vec_reg_offset(int regno, int element, TCGMemOp size)
+{
+    int offs = offsetof(CPUARMState, vfp.regs[regno * 2]);
+#ifdef HOST_WORDS_BIGENDIAN
+    /* This is complicated slightly because vfp.regs[2n] is
+     * still the low half and  vfp.regs[2n+1] the high half
+     * of the 128 bit vector, even on big endian systems.
+     * Calculate the offset assuming a fully bigendian 128 bits,
+     * then XOR to account for the order of the two 64 bit halves.
+     */
+    offs += (16 - ((element + 1) * (1 << size)));
+    offs ^= 8;
+#else
+    offs += element * (1 << size);
+#endif
+    return offs;
+}
+
 /* Return the offset into CPUARMState of a slice (from
  * the least significant end) of FP register Qn (ie
  * Dn, Sn, Hn or Bn).
@@ -661,6 +683,108 @@ static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, int size)
 }
 
 /*
+ * Vector load/store helpers.
+ *
+ * The principal difference between this and a FP load is that we don't
+ * zero extend as we are filling a partial chunk of the vector register.
+ * These functions don't support 128 bit loads/stores, which would be
+ * normal load/store operations.
+ */
+
+/* Get value of an element within a vector register */
+static void read_vec_element(DisasContext *s, TCGv_i64 tcg_dest, int srcidx,
+                             int element, TCGMemOp memop)
+{
+    int vect_off = vec_reg_offset(srcidx, element, memop & MO_SIZE);
+    switch (memop) {
+    case MO_8:
+        tcg_gen_ld8u_i64(tcg_dest, cpu_env, vect_off);
+        break;
+    case MO_16:
+        tcg_gen_ld16u_i64(tcg_dest, cpu_env, vect_off);
+        break;
+    case MO_32:
+        tcg_gen_ld32u_i64(tcg_dest, cpu_env, vect_off);
+        break;
+    case MO_8|MO_SIGN:
+        tcg_gen_ld8s_i64(tcg_dest, cpu_env, vect_off);
+        break;
+    case MO_16|MO_SIGN:
+        tcg_gen_ld16s_i64(tcg_dest, cpu_env, vect_off);
+        break;
+    case MO_32|MO_SIGN:
+        tcg_gen_ld32s_i64(tcg_dest, cpu_env, vect_off);
+        break;
+    case MO_64:
+    case MO_64|MO_SIGN:
+        tcg_gen_ld_i64(tcg_dest, cpu_env, vect_off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+/* Set value of an element within a vector register */
+static void write_vec_element(DisasContext *s, TCGv_i64 tcg_src, int destidx,
+                              int element, TCGMemOp memop)
+{
+    int vect_off = vec_reg_offset(destidx, element, memop & MO_SIZE);
+    switch (memop) {
+    case MO_8:
+        tcg_gen_st8_i64(tcg_src, cpu_env, vect_off);
+        break;
+    case MO_16:
+        tcg_gen_st16_i64(tcg_src, cpu_env, vect_off);
+        break;
+    case MO_32:
+        tcg_gen_st32_i64(tcg_src, cpu_env, vect_off);
+        break;
+    case MO_64:
+        tcg_gen_st_i64(tcg_src, cpu_env, vect_off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+/* Clear the high 64 bits of a 128 bit vector (in general non-quad
+ * vector ops all need to do this).
+ */
+static void clear_vec_high(DisasContext *s, int rd)
+{
+    TCGv_i64 tcg_zero = tcg_const_i64(0);
+
+    write_vec_element(s, tcg_zero, rd, 1, MO_64);
+    tcg_temp_free_i64(tcg_zero);
+}
+
+/* Store from vector register to memory */
+static void do_vec_st(DisasContext *s, int srcidx, int element,
+                      TCGv_i64 tcg_addr, int size)
+{
+    TCGMemOp memop = MO_TE + size;
+    TCGv_i64 tcg_tmp = tcg_temp_new_i64();
+
+    read_vec_element(s, tcg_tmp, srcidx, element, size);
+    tcg_gen_qemu_st_i64(tcg_tmp, tcg_addr, get_mem_index(s), memop);
+
+    tcg_temp_free_i64(tcg_tmp);
+}
+
+/* Load from memory to vector register */
+static void do_vec_ld(DisasContext *s, int destidx, int element,
+                      TCGv_i64 tcg_addr, int size)
+{
+    TCGMemOp memop = MO_TE + size;
+    TCGv_i64 tcg_tmp = tcg_temp_new_i64();
+
+    tcg_gen_qemu_ld_i64(tcg_tmp, tcg_addr, get_mem_index(s), memop);
+    write_vec_element(s, tcg_tmp, destidx, element, size);
+
+    tcg_temp_free_i64(tcg_tmp);
+}
+
+/*
  * This utility function is for doing register extension with an
  * optional shift. You will likely want to pass a temporary for the
  * destination register. See DecodeRegExtend() in the ARM ARM.
@@ -1835,10 +1959,132 @@ static void disas_ldst_reg(DisasContext *s, uint32_t insn)
     }
 }
 
-/* AdvSIMD load/store multiple structures */
+/* C3.3.1 AdvSIMD load/store multiple structures
+ *
+ *  31  30  29           23 22  21         16 15    12 11  10 9    5 4    0
+ * +---+---+---------------+---+-------------+--------+------+------+------+
+ * | 0 | Q | 0 0 1 1 0 0 0 | L | 0 0 0 0 0 0 | opcode | size |  Rn  |  Rt  |
+ * +---+---+---------------+---+-------------+--------+------+------+------+
+ *
+ * C3.3.2 AdvSIMD load/store multiple structures (post-indexed)
+ *
+ *  31  30  29           23 22  21  20     16 15    12 11  10 9    5 4    0
+ * +---+---+---------------+---+---+---------+--------+------+------+------+
+ * | 0 | Q | 0 0 1 1 0 0 1 | L | 0 |   Rm    | opcode | size |  Rn  |  Rt  |
+ * +---+---+---------------+---+---+---------+--------+------+------+------+
+ *
+ * Rt: first (or only) SIMD&FP register to be transferred
+ * Rn: base address or SP
+ * Rm (post-index only): post-index register (when !31) or size dependent #imm
+ */
 static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int rt = extract32(insn, 0, 5);
+    int rn = extract32(insn, 5, 5);
+    int size = extract32(insn, 10, 2);
+    int opcode = extract32(insn, 12, 4);
+    bool is_store = !extract32(insn, 22, 1);
+    bool is_postidx = extract32(insn, 23, 1);
+    bool is_q = extract32(insn, 30, 1);
+    TCGv_i64 tcg_addr, tcg_rn;
+
+    int ebytes = 1 << size;
+    int elements = (is_q ? 128 : 64) / (8 << size);
+    int rpt;    /* num iterations */
+    int selem;  /* structure elements */
+    int r;
+
+    if (extract32(insn, 31, 1) || extract32(insn, 21, 1)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    /* From the shared decode logic */
+    switch (opcode) {
+    case 0x0:
+        rpt = 1;
+        selem = 4;
+        break;
+    case 0x2:
+        rpt = 4;
+        selem = 1;
+        break;
+    case 0x4:
+        rpt = 1;
+        selem = 3;
+        break;
+    case 0x6:
+        rpt = 3;
+        selem = 1;
+        break;
+    case 0x7:
+        rpt = 1;
+        selem = 1;
+        break;
+    case 0x8:
+        rpt = 1;
+        selem = 2;
+        break;
+    case 0xa:
+        rpt = 2;
+        selem = 1;
+        break;
+    default:
+        unallocated_encoding(s);
+        return;
+    }
+
+    if (size == 3 && !is_q && selem != 1) {
+        /* reserved */
+        unallocated_encoding(s);
+        return;
+    }
+
+    if (rn == 31) {
+        gen_check_sp_alignment(s);
+    }
+
+    tcg_rn = cpu_reg_sp(s, rn);
+    tcg_addr = tcg_temp_new_i64();
+    tcg_gen_mov_i64(tcg_addr, tcg_rn);
+
+    for (r = 0; r < rpt; r++) {
+        int e;
+        for (e = 0; e < elements; e++) {
+            int tt = (rt + r) % 32;
+            int xs;
+            for (xs = 0; xs < selem; xs++) {
+                if (is_store) {
+                    do_vec_st(s, tt, e, tcg_addr, size);
+                } else {
+                    do_vec_ld(s, tt, e, tcg_addr, size);
+
+                    /* For non-quad operations, setting a slice of the low
+                     * 64 bits of the register clears the high 64 bits (in
+                     * the ARM ARM pseudocode this is implicit in the fact
+                     * that 'rval' is a 64 bit wide variable). We optimize
+                     * by noticing that we only need to do this the first
+                     * time we touch a register.
+                     */
+                    if (!is_q && e == 0 && (r == 0 || xs == selem - 1)) {
+                        clear_vec_high(s, tt);
+                    }
+                }
+                tcg_gen_addi_i64(tcg_addr, tcg_addr, ebytes);
+                tt = (tt + 1) % 32;
+            }
+        }
+    }
+
+    if (is_postidx) {
+        int rm = extract32(insn, 16, 5);
+        if (rm == 31) {
+            tcg_gen_mov_i64(tcg_rn, tcg_addr);
+        } else {
+            tcg_gen_add_i64(tcg_rn, tcg_rn, cpu_reg(s, rm));
+        }
+    }
+    tcg_temp_free_i64(tcg_addr);
 }
 
 /* AdvSIMD load/store single structure */