From patchwork Tue Mar 19 17:21:20 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Henderson <richard.henderson@linaro.org>
X-Patchwork-Id: 160594
Delivered-To: patch@linaro.org
Received: by 2002:a02:5cc1:0:0:0:0:0 with SMTP id w62csp4138691jad;
 Tue, 19 Mar 2019 10:26:42 -0700 (PDT)
X-Google-Smtp-Source: APXvYqyaDR3mOyMARP6joFP6eZNm0Vj16NhhDY6xnJGWkeTc6IbizA3M2/dbz7V8A0pw/0Nhccgq
X-Received: by 2002:a5d:428c:: with SMTP id k12mr9759442wrq.279.1553016402050; 
 Tue, 19 Mar 2019 10:26:42 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1553016402; cv=none;
 d=google.com; s=arc-20160816;
 b=k7XbQVw2I7aIyZINnk0T3B1aaDSoE7iDskRfC0kTqb39hxgwI60ItZ7/XoBviC6bDT
 Ml74q3GcO9S4caOs5jEQCszWTfA84ToG7pip+CBRCvXO5H8eiQCD87Qmi1U0x84FrZ9V
 mSEcki0LDlXrb43qE76YLO0NZGLtNxEr0kAMmB4ogq6SPFzFAbOVjpp+tychGxm0EdPk
 5uW3w7ZuF7FsXiLVo2FFmLq7TDqZIt5jlGNJWL0V6UdCVm1KxjZk2jTzJK+HrZjrTJ5M
 29vMQaH8aJcCMt3wnHzdBlY14lfgMOh6wgpizGV5aABZrBhN2oVXV9EOUK5pTEsQ1OYl
 I4Tg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive
 :list-unsubscribe:list-id:precedence:subject:references:in-reply-to
 :message-id:date:to:from:dkim-signature;
 bh=J2Fue7yr+5hikiTIBV1JWpm8G0dcOsOrgkGb+z11kJk=;
 b=0s8QsY1o4x1yAOfRkpFB1NwOW0jQKRng/pnkaX9+uxw0HP6anS6kyQiPgGUQq9UpO1
 DMPLVk/10fNKwxwCJ+ZWr0QKZi8+hXCctfUXbWh0VaHrlLvqX0zdhaK2W5EOBa9mXJQr
 lk9UodgUact6ip72/eoYsqKsuO0UR2VCNeWwpp2RF8DLp2EvBKDhUym1lVa1OUFNECep
 0mUMYkCl+kFZu3czwYMbcNTBQ0x/zdQPOhd7pgXfEfIbMrk2TaGjoY6XH6haG2puVtFD
 f3quPCSJVwgFtvH3w+y6fpc0q11K68huAMfYP/5ts8OpUezpuCqfqblwQi4LY7aAVzNb
 POyw==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=fail header.i=@linaro.org header.s=google header.b=IBwQ+350;
 spf=pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 209.51.188.17 as permitted sender)
 smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org";
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17])
 by mx.google.com with ESMTPS id
 j67si2007527wmb.77.2019.03.19.10.26.41 for <patch@linaro.org>
 (version=TLS1 cipher=AES128-SHA bits=128/128);
 Tue, 19 Mar 2019 10:26:42 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 209.51.188.17 as permitted sender) client-ip=209.51.188.17; 
Authentication-Results: mx.google.com;
 dkim=fail header.i=@linaro.org header.s=google header.b=IBwQ+350;
 spf=pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 209.51.188.17 as permitted sender)
 smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org";
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: from localhost ([127.0.0.1]:60731 helo=lists.gnu.org)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <qemu-devel-bounces+patch=linaro.org@nongnu.org>)
 id 1h6IVc-0008SE-TL
 for patch@linaro.org; Tue, 19 Mar 2019 13:26:40 -0400
Received: from eggs.gnu.org ([209.51.188.92]:49224)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <richard.henderson@linaro.org>) id 1h6IQu-0004gr-Tv
 for qemu-devel@nongnu.org; Tue, 19 Mar 2019 13:21:51 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <richard.henderson@linaro.org>) id 1h6IQs-0004cq-Q7
 for qemu-devel@nongnu.org; Tue, 19 Mar 2019 13:21:48 -0400
Received: from mail-pg1-x542.google.com ([2607:f8b0:4864:20::542]:39414)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
 id 1h6IQs-0004bZ-5m
 for qemu-devel@nongnu.org; Tue, 19 Mar 2019 13:21:46 -0400
Received: by mail-pg1-x542.google.com with SMTP id h8so14304665pgp.6
 for <qemu-devel@nongnu.org>; Tue, 19 Mar 2019 10:21:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=J2Fue7yr+5hikiTIBV1JWpm8G0dcOsOrgkGb+z11kJk=;
 b=IBwQ+350zZoh2Zl1jVgstTHm2QiWh5Gsi1rbL/engkpyq0sK8BMgTdcnY0klRg+bUZ
 ySilx4Lrsu8BQhvGZLjXOkgFBOV+OBAil9IdSa2h6DwAeU7DyQtViNDT67r5vw+iP0OA
 1cxJZneSeYY3PxTNDWxfDdZM3CoGmH7avJu2hoB5BIr/C+ghTJ+uRKS9PZezizMtWAt5
 B4HbTA2fLCoFsBmUhEv2qW5dtRszDgXXCKmMYWLbZ1xk5l278wPWoCEi+n4A2Jnmx1SL
 je4IxTotfMHCxcA7U4j+WvOsu/kQSuJld8gX3k+upxva7zY+RVlBhi0WsIHAQKOck70B
 /lSA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=J2Fue7yr+5hikiTIBV1JWpm8G0dcOsOrgkGb+z11kJk=;
 b=fWX7tGWt2GXNLh8+50x9TWpIUKZpcs8fLpquWHAJStAFF7U1C5S8gHL0w2vKlwh8kt
 PjeGj0uh36uTX8NazLfKq1yRMP5ek+oROULn8laJwjjNA/z9VPQrtKhfndjG2nymXckO
 lsVXzzvm/jYqKXNb1XlMufAlii56qX0oPjUkFr+7Syd/EX9clPI4Vnc7f+kYgYnxXrtQ
 Xejql9k0FvI9Hqrn74zq+8k8DLY2UUOLehK9W8t4SiKicydAllRuHibN+plPn6RRrs6V
 TcwAS/j9J1Q1XEIQsCC0TrGirlAAp1ABl/Ia5PWodZlCp42V2ulHiGtIdVFuki6h+Z1Q
 tXZg==
X-Gm-Message-State: APjAAAVZflLlYMtffQorcYslGWmy1ls1TBTw5TgiXwZkMtCGS43TRL7i
 95lroQqR6YP6ao/rtAD3nQKA6fNcuM8=
X-Received: by 2002:a62:d2ca:: with SMTP id
 c193mr3493280pfg.247.1553016104786; 
 Tue, 19 Mar 2019 10:21:44 -0700 (PDT)
Received: from cloudburst.twiddle.net (97-113-188-82.tukw.qwest.net.
 [97.113.188.82]) by smtp.gmail.com with ESMTPSA id
 w68sm5616666pfb.176.2019.03.19.10.21.43
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Tue, 19 Mar 2019 10:21:43 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Tue, 19 Mar 2019 10:21:20 -0700
Message-Id: <20190319172126.7502-12-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.17.2
In-Reply-To: <20190319172126.7502-1-richard.henderson@linaro.org>
References: <20190319172126.7502-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-Received-From: 2607:f8b0:4864:20::542
Subject: [Qemu-devel] [PATCH for-4.1 v3 11/17] tcg: Add INDEX_op_dup_mem_vec
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: mark.cave-ayland@ilande.co.uk, david@gibson.dropbear.id.au
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first.  This is especially important
if integer/vector register moves do not exist.

Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:

  VECE == 32/64:
    Load the value into a vector register, then dup.
    Both of these must work.

  VECE == 8/16:
    If the value happens to be at an offset such that an aligned
    load would place the desired value in the least significant
    end of the register, go ahead and load w/garbage in high bits.

    Load the value w/INDEX_op_ld{8,16}_i32.
    Attempt a move directly to vector reg, which may fail.
    Store the value into the backing store for OTS.
    Load the value into the vector reg w/TCG_TYPE_I32, which must work.
    Duplicate from the vector reg into itself, which must work.

All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op.h                 |  1 +
 tcg/tcg-opc.h                |  1 +
 tcg/aarch64/tcg-target.inc.c |  4 ++
 tcg/i386/tcg-target.inc.c    |  4 ++
 tcg/tcg-op-gvec.c            | 88 +++++++++++++++++++-----------------
 tcg/tcg-op-vec.c             | 11 +++++
 tcg/tcg.c                    |  1 +
 7 files changed, 69 insertions(+), 41 deletions(-)

-- 
2.17.2

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index d3e51b15af..64cd3f58ef 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -950,6 +950,7 @@ void tcg_gen_atomic_umax_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 void tcg_gen_mov_vec(TCGv_vec, TCGv_vec);
 void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec, TCGv_i32);
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec, TCGv_i64);
+void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec, TCGv_ptr, tcg_target_long);
 void tcg_gen_dup8i_vec(TCGv_vec, uint32_t);
 void tcg_gen_dup16i_vec(TCGv_vec, uint32_t);
 void tcg_gen_dup32i_vec(TCGv_vec, uint32_t);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4e0238ad1a..cc02e12b7e 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -217,6 +217,7 @@ DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS == 32))
 
 DEF(ld_vec, 1, 1, 1, IMPLVEC)
 DEF(st_vec, 0, 2, 1, IMPLVEC)
+DEF(dupm_vec, 1, 1, 1, IMPLVEC)
 
 DEF(add_vec, 1, 2, 0, IMPLVEC)
 DEF(sub_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index d32e83ddf2..bee7296713 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2183,6 +2183,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_st_vec:
         tcg_out_st(s, type, a0, a1, a2);
         break;
+    case INDEX_op_dupm_vec:
+        tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+        break;
     case INDEX_op_add_vec:
         tcg_out_insn(s, 3616, ADD, is_q, vece, a0, a1, a2);
         break;
@@ -2509,6 +2512,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &w_w;
     case INDEX_op_ld_vec:
     case INDEX_op_st_vec:
+    case INDEX_op_dupm_vec:
         return &w_r;
     case INDEX_op_dup_vec:
         return &w_wr;
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 2b88f2054e..54627e8d13 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2820,6 +2820,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_st_vec:
         tcg_out_st(s, type, a0, a1, a2);
         break;
+    case INDEX_op_dupm_vec:
+        tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+        break;
 
     case INDEX_op_x86_shufps_vec:
         insn = OPC_SHUFPS;
@@ -3102,6 +3105,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_ld_vec:
     case INDEX_op_st_vec:
+    case INDEX_op_dupm_vec:
         return &x_r;
 
     case INDEX_op_add_vec:
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 0996ef0812..f056018713 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -390,6 +390,40 @@ static TCGType choose_vector_type(TCGOpcode op, unsigned vece, uint32_t size,
     return 0;
 }
 
+static void do_dup_store(TCGType type, uint32_t dofs, uint32_t oprsz,
+                         uint32_t maxsz, TCGv_vec t_vec)
+{
+    uint32_t i = 0;
+
+    switch (type) {
+    case TCG_TYPE_V256:
+        /* Recall that ARM SVE allows vector sizes that are not a
+         * power of 2, but always a multiple of 16.  The intent is
+         * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+         */
+        for (; i + 32 <= oprsz; i += 32) {
+            tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256);
+        }
+        /* fallthru */
+    case TCG_TYPE_V128:
+        for (; i + 16 <= oprsz; i += 16) {
+            tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128);
+        }
+        break;
+    case TCG_TYPE_V64:
+        for (; i < oprsz; i += 8) {
+            tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (oprsz < maxsz) {
+        expand_clr(dofs + oprsz, maxsz - oprsz);
+    }
+}
+
 /* Set OPRSZ bytes at DOFS to replications of IN_32, IN_64 or IN_C.
  * Only one of IN_32 or IN_64 may be set;
  * IN_C is used if IN_32 and IN_64 are unset.
@@ -429,49 +463,11 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
         } else if (in_64) {
             tcg_gen_dup_i64_vec(vece, t_vec, in_64);
         } else {
-            switch (vece) {
-            case MO_8:
-                tcg_gen_dup8i_vec(t_vec, in_c);
-                break;
-            case MO_16:
-                tcg_gen_dup16i_vec(t_vec, in_c);
-                break;
-            case MO_32:
-                tcg_gen_dup32i_vec(t_vec, in_c);
-                break;
-            default:
-                tcg_gen_dup64i_vec(t_vec, in_c);
-                break;
-            }
+            tcg_gen_dupi_vec(vece, t_vec, in_c);
         }
-
-        i = 0;
-        switch (type) {
-        case TCG_TYPE_V256:
-            /* Recall that ARM SVE allows vector sizes that are not a
-             * power of 2, but always a multiple of 16.  The intent is
-             * that e.g. size == 80 would be expanded with 2x32 + 1x16.
-             */
-            for (; i + 32 <= oprsz; i += 32) {
-                tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256);
-            }
-            /* fallthru */
-        case TCG_TYPE_V128:
-            for (; i + 16 <= oprsz; i += 16) {
-                tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128);
-            }
-            break;
-        case TCG_TYPE_V64:
-            for (; i < oprsz; i += 8) {
-                tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64);
-            }
-            break;
-        default:
-            g_assert_not_reached();
-        }
-
+        do_dup_store(type, dofs, oprsz, maxsz, t_vec);
         tcg_temp_free_vec(t_vec);
-        goto done;
+        return;
     }
 
     /* Otherwise, inline with an integer type, unless "large".  */
@@ -1287,6 +1283,16 @@ void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t oprsz,
 void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
                           uint32_t oprsz, uint32_t maxsz)
 {
+    if (vece <= MO_64) {
+        TCGType type = choose_vector_type(0, vece, oprsz, 0);
+        if (type != 0) {
+            TCGv_vec t_vec = tcg_temp_new_vec(type);
+            tcg_gen_dup_mem_vec(vece, t_vec, cpu_env, aofs);
+            do_dup_store(type, dofs, oprsz, maxsz, t_vec);
+            tcg_temp_free_vec(t_vec);
+            return;
+        }
+    }
     if (vece <= MO_32) {
         TCGv_i32 in = tcg_temp_new_i32();
         switch (vece) {
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index cfb18682b1..ce7987b858 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -194,6 +194,17 @@ void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec r, TCGv_i32 a)
     vec_gen_2(INDEX_op_dup_vec, type, vece, ri, ai);
 }
 
+void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec r, TCGv_ptr b,
+                         tcg_target_long ofs)
+{
+    TCGArg ri = tcgv_vec_arg(r);
+    TCGArg bi = tcgv_ptr_arg(b);
+    TCGTemp *rt = arg_temp(ri);
+    TCGType type = rt->base_type;
+
+    vec_gen_3(INDEX_op_dupm_vec, type, vece, ri, bi, ofs);
+}
+
 static void vec_gen_ldst(TCGOpcode opc, TCGv_vec r, TCGv_ptr b, TCGArg o)
 {
     TCGArg ri = tcgv_vec_arg(r);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index b157e52d5c..47f36a358d 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1601,6 +1601,7 @@ bool tcg_op_supported(TCGOpcode op)
     case INDEX_op_mov_vec:
     case INDEX_op_dup_vec:
     case INDEX_op_dupi_vec:
+    case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
     case INDEX_op_st_vec:
     case INDEX_op_add_vec: