From patchwork Mon Dec 18 17:17:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122269 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3127052qgn; Mon, 18 Dec 2017 09:45:53 -0800 (PST) X-Google-Smtp-Source: ACJfBovxwWCTQ7vcJTkszI8mfrV/TbTLEmQ79au5gTBsuuYddLTnUQkknV2V05eA2eHWKPYyfKAb X-Received: by 10.129.172.13 with SMTP id k13mr416490ywh.275.1513619153063; Mon, 18 Dec 2017 09:45:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513619153; cv=none; d=google.com; s=arc-20160816; b=T4z3cBgoZSNffR+dpyD+NF7OdE7EhE9AkrDlFpnxkyq5d0Tw2R/yhZyXxdE8aK5pPI jY7uO7gQGL+XKNFH+aVM9QvYPBXzEcXlliTgZNTe0d8u9Gm6bgHGkshS7mJsCiBeD9Lh B5oULMQz2P3SpD0cvXq5I/W+qw3FiQn18YhcrzeWUqyJzWtMKOYH4J9sApFj87Bh9H/G TtB1teezZJMoAWMQ2geGwcC85gc7ebFVF/XQdHb27X64nVW7jk2shw2Cs70RzLe/f/eK 7KAw2wUqUM92gmf10dqADi/Z5AynllF/cd7jBeP2PJr4FoOaNJ0VXXIEdHdNu/f47CSX 9lXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=8i/JRKaTSpgOJSSRFkW5UKcEKW6ZwDPhYDe8Wqjfx8I=; b=oB7bZj7toQ6WoBTn1UguRy0PU7bf50nHxeTGNOmwroj4SVfiCVc6waubE74V1i1S+j IqNnn+dhakOW9JuwEMl74ryiI8loKsNS8jWl+oVB2P/+/bo4pAQpxcAixPYskd35B0JM eLPwmyHSaL3Gnk9pOCZ+R06f1Vw2D/wx8LDOBQJQJWgWNoCslQCmNG1kH84Xjw7Oyfeg N3TEBdzm0SpcAuVW0Qjz6XmY5tlp89aw6i8yvROd+Bm4GbTa4kGEBUyJzPHt45n0IDql 4j401T0vEWnfmsIuQbLaq827OMlT4U2rMqfjJ8JnnCyqUbxRa/YVLc8icb3E73FFEknv DgrQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=jZYNM72c; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id g11si2623581ybm.503.2017.12.18.09.45.52 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:45:53 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=jZYNM72c; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:59083 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzU8-0006cZ-HU for patch@linaro.org; Mon, 18 Dec 2017 12:45:52 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37594) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3q-0000iB-F6 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3o-0001Y2-L6 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:42 -0500 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:39324) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3o-0001X6-BO for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:40 -0500 Received: by mail-pf0-x241.google.com with SMTP id l24so9940974pfj.6 for ; Mon, 18 Dec 2017 09:18:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=8i/JRKaTSpgOJSSRFkW5UKcEKW6ZwDPhYDe8Wqjfx8I=; b=jZYNM72c+UJ2kJmkbJ/YQCAfFmAcrPcvY4k53BsqArSGKITfvRWdXrdzvHaPMXaCPL cUQ+nUloYBzR1GB2qWxRipDYmsq5DPBnORHE8nnx2TxGH9zY1e9CGOo65L5GHTOYK+6Z eZXpbIM4PR3nL3HtDMunHxXVDvlRsOsP3zyng= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=8i/JRKaTSpgOJSSRFkW5UKcEKW6ZwDPhYDe8Wqjfx8I=; b=GmSWftpjaKvDU1oQg6O7ovEtp1q5mVHozwzoOAi6UrU041vsQRUeAtXJoXw+B6MEDB qQVZVCOyopzVi8p/a7GBCcmQ9slEXIai+evk+5Anq2SqMeI3Xyo12iRaFul9Yggx33ye tVYLBAN4b22AKFpvv4ICGUwRyA0214qzE+KpcS9p9qNsUIpTOdtA8ItpYtERbGBDmoQy gqPxCuRMIuqdYU+3SBnh/2X3qzFHZx8ruX2HVvnr4Gr+LD7xq72H1FzYumNn7ipsy3d9 +VzVJjx/RFf/RfBlALgxHNJ/YCD6lRN8s3U8ghx8lc59RdYFToFyDl7mFEqKi3piRbyb hOvg== X-Gm-Message-State: AKGB3mLtjqab77itgGGKLhLaUKsT2E94lH57wHtgjFd1xJcTLZvwyOe0 wFtIpBJ2V25vZ3uPSYs/E1Z61qu+/fI= X-Received: by 10.98.57.131 with SMTP id u3mr404785pfj.7.1513617518894; Mon, 18 Dec 2017 09:18:38 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.37 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:37 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:56 -0800 Message-Id: <20171218171758.16964-25-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v7 24/26] tcg: Add support for 4 operand vector ops X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- tcg/tcg-op-gvec.h | 45 +++++++++-- tcg/tcg-op-gvec.c | 226 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 266 insertions(+), 5 deletions(-) -- 2.14.3 diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 188c3368bd..91459b9c38 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -30,29 +30,43 @@ /* Expand a call to a gvec-style helper, with pointers to two vector operands, and a descriptor (see tcg-gvec-desc.h). */ -typedef void (gen_helper_gvec_2)(TCGv_ptr, TCGv_ptr, TCGv_i32); +typedef void gen_helper_gvec_2(TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_2 *fn); /* Similarly, passing an extra pointer (e.g. env or float_status). */ -typedef void (gen_helper_gvec_2_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +typedef void gen_helper_gvec_2_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_2_ptr *fn); /* Similarly, with three vector operands. */ -typedef void (gen_helper_gvec_3)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +typedef void gen_helper_gvec_3(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_3 *fn); -typedef void (gen_helper_gvec_3_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, - TCGv_ptr, TCGv_i32); +/* Similarly, with four vector operands. */ +typedef void gen_helper_gvec_4(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_4 *fn); + +typedef void gen_helper_gvec_3_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_3_ptr *fn); +typedef void gen_helper_gvec_4_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, TCGv_ptr ptr, uint32_t oprsz, + uint32_t maxsz, int32_t data, + gen_helper_gvec_4_ptr *fn); + /* Expand a gvec operation. Either inline or out-of-line depending on the actual vector size and the operations supported by the host. */ typedef struct { @@ -114,12 +128,33 @@ typedef struct { bool load_dest; } GVecGen3; +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, TCGv_vec); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_4 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The data argument to the out-of-line helper. */ + uint32_t data; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; +} GVecGen4; + void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, uint32_t opsz, uint32_t clsz, const GVecGen2 *); void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t opsz, uint32_t clsz, unsigned c, const GVecGen2i *); void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz, const GVecGen3 *); +void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs, + uint32_t opsz, uint32_t clsz, const GVecGen4 *); /* Expand a specific vector operation. */ diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 8ec1817727..5f58859a1e 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -55,6 +55,18 @@ static void check_overlap_3(uint32_t d, uint32_t a, uint32_t b, uint32_t s) check_overlap_2(a, b, s); } +/* Verify vector overlap rules for four operands. */ +static void check_overlap_4(uint32_t d, uint32_t a, uint32_t b, + uint32_t c, uint32_t s) +{ + check_overlap_2(d, a, s); + check_overlap_2(d, b, s); + check_overlap_2(d, c, s); + check_overlap_2(a, b, s); + check_overlap_2(a, c, s); + check_overlap_2(b, c, s); +} + /* Create a descriptor from components. */ uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data) { @@ -118,6 +130,33 @@ void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, tcg_temp_free_i32(desc); } +/* Generate a call to a gvec-style helper with four vector operands. */ +void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_4 *fn) +{ + TCGv_ptr a0, a1, a2, a3; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + a3 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + tcg_gen_addi_ptr(a3, cpu_env, cofs); + + fn(a0, a1, a2, a3, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_ptr(a3); + tcg_temp_free_i32(desc); +} + /* Generate a call to a gvec-style helper with three vector operands and an extra pointer operand. */ void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, @@ -165,6 +204,35 @@ void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, tcg_temp_free_i32(desc); } +/* Generate a call to a gvec-style helper with four vector operands + and an extra pointer operand. */ +void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, TCGv_ptr ptr, uint32_t oprsz, + uint32_t maxsz, int32_t data, + gen_helper_gvec_4_ptr *fn) +{ + TCGv_ptr a0, a1, a2, a3; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + a3 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + tcg_gen_addi_ptr(a3, cpu_env, cofs); + + fn(a0, a1, a2, a3, ptr, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_ptr(a3); + tcg_temp_free_i32(desc); +} + /* Return true if we want to implement something of OPRSZ bytes in units of LNSZ. This limits the expansion of inline code. */ static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz) @@ -468,6 +536,30 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs, tcg_temp_free_i32(t0); } +/* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ +static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t opsz, + void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + TCGv_i32 t2 = tcg_temp_new_i32(); + TCGv_i32 t3 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i32(t1, cpu_env, aofs + i); + tcg_gen_ld_i32(t2, cpu_env, bofs + i); + tcg_gen_ld_i32(t3, cpu_env, cofs + i); + fni(t0, t1, t2, t3); + tcg_gen_st_i32(t0, cpu_env, dofs + i); + } + tcg_temp_free_i32(t3); + tcg_temp_free_i32(t2); + tcg_temp_free_i32(t1); + tcg_temp_free_i32(t0); +} + /* Expand OPSZ bytes worth of two-operand operations using i64 elements. */ static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t opsz, void (*fni)(TCGv_i64, TCGv_i64)) @@ -527,6 +619,30 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs, tcg_temp_free_i64(t0); } +/* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ +static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t opsz, + void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i64(t1, cpu_env, aofs + i); + tcg_gen_ld_i64(t2, cpu_env, bofs + i); + tcg_gen_ld_i64(t3, cpu_env, cofs + i); + fni(t0, t1, t2, t3); + tcg_gen_st_i64(t0, cpu_env, dofs + i); + } + tcg_temp_free_i64(t3); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t0); +} + /* Expand OPSZ bytes worth of two-operand operations using host vectors. */ static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t opsz, uint32_t tysz, TCGType type, @@ -591,6 +707,32 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_temp_free_vec(t0); } +/* Expand OPSZ bytes worth of four-operand operations using host vectors. */ +static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t cofs, uint32_t opsz, + uint32_t tysz, TCGType type, + void (*fni)(unsigned, TCGv_vec, TCGv_vec, + TCGv_vec, TCGv_vec)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + TCGv_vec t2 = tcg_temp_new_vec(type); + TCGv_vec t3 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < opsz; i += tysz) { + tcg_gen_ld_vec(t1, cpu_env, aofs + i); + tcg_gen_ld_vec(t2, cpu_env, bofs + i); + tcg_gen_ld_vec(t3, cpu_env, cofs + i); + fni(vece, t0, t1, t2, t3); + tcg_gen_st_vec(t0, cpu_env, dofs + i); + } + tcg_temp_free_vec(t3); + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t0); +} + /* Expand a vector two-operand operation. */ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz, const GVecGen2 *g) @@ -831,6 +973,90 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, g->data, g->fno); } +/* Expand a vector four-operand operation. */ +void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen4 *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs | bofs | cofs); + check_overlap_4(dofs, aofs, bofs, cofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 32, TCG_TYPE_V256, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 16, TCG_TYPE_V128, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 8, TCG_TYPE_V64, g->fniv); + } else if (g->fni8) { + expand_4_i64(dofs, aofs, bofs, cofs, done, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_4_i32(dofs, aofs, bofs, cofs, done, g->fni4); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_4_ool(dofs, aofs, bofs, cofs, oprsz, maxsz, g->data, g->fno); +} + /* * Expand specific vector operations. */