From patchwork Wed May 30 18:01:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 137274 Delivered-To: patch@linaro.org Received: by 2002:a2e:9706:0:0:0:0:0 with SMTP id r6-v6csp5638603lji; Wed, 30 May 2018 11:11:58 -0700 (PDT) X-Google-Smtp-Source: ADUXVKL/ndE0f+QSeEbRA4ip+N4vVZO06FF1ZjvjcBKnVGoplxNM057AO4Eb2sIrq06Hax8kG6BV X-Received: by 2002:ac8:7159:: with SMTP id h25-v6mr3590321qtp.206.1527703918113; Wed, 30 May 2018 11:11:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527703918; cv=none; d=google.com; s=arc-20160816; b=th5/2U9oXugAgvubXuG4S/Xp3DtLDbGPq9L7mcimPzFQ/7vaDV+HKyxGPXxcrgrBe1 f970xo1M6Ex+EdTujIq0y8cjoGuiaiIeXVoSHQiQ+n81UK+AoLy9pA4ank5TdTYdS1om 78kz0bALVloLo4WFsUe03gEdx9f0e0DBFkstr8BE/jmAjo7xwPDarq9sVIyEkGXheB6B t5FDdqXVtqMZuqwKMTq6oGerbG03pZGWJG/NCsNJN/quSvqkrTzKAuphRoDjp1HZC6/K WCnu16Ot1E3i4fDKX9c9IcmTlBvzwbaTlVHhX2yWfUGobONgbabOPvtlWc/raKoAkcqd aSog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=B9ekVgR/sp50y+9jw/muU0H/7AM7yNdC4FiG/uIbNMA=; b=KdcgRlhUJe06eCGNnHLsdiVefTHqTGLEY34o3w3qDjwlWM9t1ro+XFglcIPF2MAzLj IdR/iSn0SNnPKi6eOZAsr8oFmVtiCx5MvHFL2PCoPH7I3tRw8o+HwA4oSLV/PN7Ic1U5 jrmirDUFBm6ekHO+jWptLPOyZup1CEiY3aEAcBEhKsP2ryaJ9CbOa+a+4cBFhK8G6Lls m5CEOxu30FJQNHxbtMKYNh76ilYOkJJRU0XyxaVt9KhIJ9pt4jb15/O35o2KEBIh7jx5 EaECPebA5Xk3clUbbcPTHUvtPC/N/qhNUUVPmR30v226Ez70/msminMW65dYIUW+JEpD +Lyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=W9blBZ0N; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id e65-v6si72786qkj.238.2018.05.30.11.11.57 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 30 May 2018 11:11:58 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=W9blBZ0N; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:40096 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fO5Zl-0000he-E0 for patch@linaro.org; Wed, 30 May 2018 14:11:57 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33295) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fO5Pk-0001tm-CC for qemu-devel@nongnu.org; Wed, 30 May 2018 14:01:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fO5Ph-0004Vi-NN for qemu-devel@nongnu.org; Wed, 30 May 2018 14:01:36 -0400 Received: from mail-pl0-x229.google.com ([2607:f8b0:400e:c01::229]:33979) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fO5Ph-0004Uv-Cb for qemu-devel@nongnu.org; Wed, 30 May 2018 14:01:33 -0400 Received: by mail-pl0-x229.google.com with SMTP id ay10-v6so11557962plb.1 for ; Wed, 30 May 2018 11:01:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=B9ekVgR/sp50y+9jw/muU0H/7AM7yNdC4FiG/uIbNMA=; b=W9blBZ0ND+ZWQSRMaHBGYuNOmX8KIzh+nUWXcNElUrRtIypq9SXB4pCsrqjjJHadJE PLhSQxraDrivO+CACeYhIiNl11b+nz/PRE/adE9Zwm0ZTZu7AAQyu/gazutmYkqq/ow0 hpl9WRTEuDkoO83bFX8E1g7u63jTZI3dAvR+w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=B9ekVgR/sp50y+9jw/muU0H/7AM7yNdC4FiG/uIbNMA=; b=ZFVBp8XCRGY/tha1wyBsOapKl+/Hduk+Eo5y7IRaZIM6uiQGPsei8mh7N/SmjQR/mo V6FaIHORUn3RTf/VCvQ5jr5tEgLf+EG89QZuqwZm2M8Nz0LJ4JwoB4IEaML81viARSrt voYExeheipTUsYODICHWv/mCo4TCRWhPbxzLgZa51yVKV5XFgVMil7R2qI+M5qGPG4J8 7mLhe/bV5ii+5eZ+/JwUVqhIbDDpw+peL6xsldz8N7Yam3P+7EE4dzcRfT4G0W7B3j4C cdvr63Y6e48czI9EOiktXr6fZCF/AAlkRKf9bZYOhFDVWfv7T2F/pXEsmJjBue0E1jui vPig== X-Gm-Message-State: ALKqPwclJ5M3pbwq8IBkOHGFP7FLrle3LTFFV7CPXUAOkPmhNs2Wqc1o AJ+aXcxyDXp4qgxR1T4CF91i+L6ah/c= X-Received: by 2002:a17:902:7244:: with SMTP id c4-v6mr3771026pll.265.1527703291688; Wed, 30 May 2018 11:01:31 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id b84-v6sm28179157pfm.123.2018.05.30.11.01.30 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 30 May 2018 11:01:30 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Wed, 30 May 2018 11:01:08 -0700 Message-Id: <20180530180120.13355-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180530180120.13355-1-richard.henderson@linaro.org> References: <20180530180120.13355-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::229 Subject: [Qemu-devel] [PATCH v3b 06/18] target/arm: Implement SVE conditionally broadcast/extract element X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 2 + target/arm/sve_helper.c | 11 ++ target/arm/translate-sve.c | 318 +++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 20 +++ 4 files changed, 351 insertions(+) -- 2.17.0 Reviewed-by: Peter Maydell diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index d977aea00d..a58fb4ba01 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -463,6 +463,8 @@ DEF_HELPER_FLAGS_4(sve_trn_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_compact_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_compact_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_2(sve_last_active_element, TCG_CALL_NO_RWG, s32, ptr, i32) + DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index ed3c6d4ca9..941a098234 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2070,3 +2070,14 @@ void HELPER(sve_compact_d)(void *vd, void *vn, void *vg, uint32_t desc) d[j] = 0; } } + +/* Similar to the ARM LastActiveElement pseudocode function, except the + result is multiplied by the element size. This includes the not found + indication; e.g. not found for esz=3 is -8. */ +int32_t HELPER(sve_last_active_element)(void *vg, uint32_t pred_desc) +{ + intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2; + intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2); + + return last_active_element(vg, DIV_ROUND_UP(oprsz, 8), esz); +} diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index ed0f48a927..edcef277f8 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -2296,6 +2296,324 @@ static bool trans_COMPACT(DisasContext *s, arg_rpr_esz *a, uint32_t insn) return do_zpz_ool(s, a, fns[a->esz]); } +/* Call the helper that computes the ARM LastActiveElement pseudocode + function, scaled by the element size. This includes the not found + indication; e.g. not found for esz=3 is -8. */ +static void find_last_active(DisasContext *s, TCGv_i32 ret, int esz, int pg) +{ + /* Predicate sizes may be smaller and cannot use simd_desc. We cannot + round up, as we do elsewhere, because we need the exact size. */ + TCGv_ptr t_p = tcg_temp_new_ptr(); + TCGv_i32 t_desc; + unsigned vsz = pred_full_reg_size(s); + unsigned desc; + + desc = vsz - 2; + desc = deposit32(desc, SIMD_DATA_SHIFT, 2, esz); + + tcg_gen_addi_ptr(t_p, cpu_env, pred_full_reg_offset(s, pg)); + t_desc = tcg_const_i32(desc); + + gen_helper_sve_last_active_element(ret, t_p, t_desc); + + tcg_temp_free_i32(t_desc); + tcg_temp_free_ptr(t_p); +} + +/* Increment LAST to the offset of the next element in the vector, + wrapping around to 0. */ +static void incr_last_active(DisasContext *s, TCGv_i32 last, int esz) +{ + unsigned vsz = vec_full_reg_size(s); + + tcg_gen_addi_i32(last, last, 1 << esz); + if (is_power_of_2(vsz)) { + tcg_gen_andi_i32(last, last, vsz - 1); + } else { + TCGv_i32 max = tcg_const_i32(vsz); + TCGv_i32 zero = tcg_const_i32(0); + tcg_gen_movcond_i32(TCG_COND_GEU, last, last, max, zero, last); + tcg_temp_free_i32(max); + tcg_temp_free_i32(zero); + } +} + +/* If LAST < 0, set LAST to the offset of the last element in the vector. */ +static void wrap_last_active(DisasContext *s, TCGv_i32 last, int esz) +{ + unsigned vsz = vec_full_reg_size(s); + + if (is_power_of_2(vsz)) { + tcg_gen_andi_i32(last, last, vsz - 1); + } else { + TCGv_i32 max = tcg_const_i32(vsz - (1 << esz)); + TCGv_i32 zero = tcg_const_i32(0); + tcg_gen_movcond_i32(TCG_COND_LT, last, last, zero, max, last); + tcg_temp_free_i32(max); + tcg_temp_free_i32(zero); + } +} + +/* Load an unsigned element of ESZ from BASE+OFS. */ +static TCGv_i64 load_esz(TCGv_ptr base, int ofs, int esz) +{ + TCGv_i64 r = tcg_temp_new_i64(); + + switch (esz) { + case 0: + tcg_gen_ld8u_i64(r, base, ofs); + break; + case 1: + tcg_gen_ld16u_i64(r, base, ofs); + break; + case 2: + tcg_gen_ld32u_i64(r, base, ofs); + break; + case 3: + tcg_gen_ld_i64(r, base, ofs); + break; + default: + g_assert_not_reached(); + } + return r; +} + +/* Load an unsigned element of ESZ from RM[LAST]. */ +static TCGv_i64 load_last_active(DisasContext *s, TCGv_i32 last, + int rm, int esz) +{ + TCGv_ptr p = tcg_temp_new_ptr(); + TCGv_i64 r; + + /* Convert offset into vector into offset into ENV. + The final adjustment for the vector register base + is added via constant offset to the load. */ +#ifdef HOST_WORDS_BIGENDIAN + /* Adjust for element ordering. See vec_reg_offset. */ + if (esz < 3) { + tcg_gen_xori_i32(last, last, 8 - (1 << esz)); + } +#endif + tcg_gen_ext_i32_ptr(p, last); + tcg_gen_add_ptr(p, p, cpu_env); + + r = load_esz(p, vec_full_reg_offset(s, rm), esz); + tcg_temp_free_ptr(p); + + return r; +} + +/* Compute CLAST for a Zreg. */ +static bool do_clast_vector(DisasContext *s, arg_rprr_esz *a, bool before) +{ + if (!sve_access_check(s)) { + return true; + } + + TCGv_i32 last = tcg_temp_local_new_i32(); + TCGLabel *over = gen_new_label(); + TCGv_i64 ele; + unsigned vsz, esz = a->esz; + + find_last_active(s, last, esz, a->pg); + + /* There is of course no movcond for a 2048-bit vector, + so we must branch over the actual store. */ + tcg_gen_brcondi_i32(TCG_COND_LT, last, 0, over); + + if (!before) { + incr_last_active(s, last, esz); + } + + ele = load_last_active(s, last, a->rm, esz); + tcg_temp_free_i32(last); + + vsz = vec_full_reg_size(s); + tcg_gen_gvec_dup_i64(esz, vec_full_reg_offset(s, a->rd), vsz, vsz, ele); + tcg_temp_free_i64(ele); + + /* If this insn used MOVPRFX, we may need a second move. */ + if (a->rd != a->rn) { + TCGLabel *done = gen_new_label(); + tcg_gen_br(done); + + gen_set_label(over); + do_mov_z(s, a->rd, a->rn); + + gen_set_label(done); + } else { + gen_set_label(over); + } + return true; +} + +static bool trans_CLASTA_z(DisasContext *s, arg_rprr_esz *a, uint32_t insn) +{ + return do_clast_vector(s, a, false); +} + +static bool trans_CLASTB_z(DisasContext *s, arg_rprr_esz *a, uint32_t insn) +{ + return do_clast_vector(s, a, true); +} + +/* Compute CLAST for a scalar. */ +static void do_clast_scalar(DisasContext *s, int esz, int pg, int rm, + bool before, TCGv_i64 reg_val) +{ + TCGv_i32 last = tcg_temp_new_i32(); + TCGv_i64 ele, cmp, zero; + + find_last_active(s, last, esz, pg); + + /* Extend the original value of last prior to incrementing. */ + cmp = tcg_temp_new_i64(); + tcg_gen_ext_i32_i64(cmp, last); + + if (!before) { + incr_last_active(s, last, esz); + } + + /* The conceit here is that while last < 0 indicates not found, after + adjusting for cpu_env->vfp.zregs[rm], it is still a valid address + from which we can load garbage. We then discard the garbage with + a conditional move. */ + ele = load_last_active(s, last, rm, esz); + tcg_temp_free_i32(last); + + zero = tcg_const_i64(0); + tcg_gen_movcond_i64(TCG_COND_GE, reg_val, cmp, zero, ele, reg_val); + + tcg_temp_free_i64(zero); + tcg_temp_free_i64(cmp); + tcg_temp_free_i64(ele); +} + +/* Compute CLAST for a Vreg. */ +static bool do_clast_fp(DisasContext *s, arg_rpr_esz *a, bool before) +{ + if (sve_access_check(s)) { + int esz = a->esz; + int ofs = vec_reg_offset(s, a->rd, 0, esz); + TCGv_i64 reg = load_esz(cpu_env, ofs, esz); + + do_clast_scalar(s, esz, a->pg, a->rn, before, reg); + write_fp_dreg(s, a->rd, reg); + tcg_temp_free_i64(reg); + } + return true; +} + +static bool trans_CLASTA_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_clast_fp(s, a, false); +} + +static bool trans_CLASTB_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_clast_fp(s, a, true); +} + +/* Compute CLAST for a Xreg. */ +static bool do_clast_general(DisasContext *s, arg_rpr_esz *a, bool before) +{ + if (!sve_access_check(s)) { + return true; + } + + TCGv_i64 reg = cpu_reg(s, a->rd); + + switch (a->esz) { + case 0: + tcg_gen_ext8u_i64(reg, reg); + break; + case 1: + tcg_gen_ext16u_i64(reg, reg); + break; + case 2: + tcg_gen_ext32u_i64(reg, reg); + break; + case 3: + break; + default: + g_assert_not_reached(); + } + + do_clast_scalar(s, a->esz, a->pg, a->rn, before, cpu_reg(s, a->rd)); + return true; +} + +static bool trans_CLASTA_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_clast_general(s, a, false); +} + +static bool trans_CLASTB_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_clast_general(s, a, true); +} + +/* Compute LAST for a scalar. */ +static TCGv_i64 do_last_scalar(DisasContext *s, int esz, + int pg, int rm, bool before) +{ + TCGv_i32 last = tcg_temp_new_i32(); + TCGv_i64 ret; + + find_last_active(s, last, esz, pg); + if (before) { + wrap_last_active(s, last, esz); + } else { + incr_last_active(s, last, esz); + } + + ret = load_last_active(s, last, rm, esz); + tcg_temp_free_i32(last); + return ret; +} + +/* Compute LAST for a Vreg. */ +static bool do_last_fp(DisasContext *s, arg_rpr_esz *a, bool before) +{ + if (sve_access_check(s)) { + TCGv_i64 val = do_last_scalar(s, a->esz, a->pg, a->rn, before); + write_fp_dreg(s, a->rd, val); + tcg_temp_free_i64(val); + } + return true; +} + +static bool trans_LASTA_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_last_fp(s, a, false); +} + +static bool trans_LASTB_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_last_fp(s, a, true); +} + +/* Compute LAST for a Xreg. */ +static bool do_last_general(DisasContext *s, arg_rpr_esz *a, bool before) +{ + if (sve_access_check(s)) { + TCGv_i64 val = do_last_scalar(s, a->esz, a->pg, a->rn, before); + tcg_gen_mov_i64(cpu_reg(s, a->rd), val); + tcg_temp_free_i64(val); + } + return true; +} + +static bool trans_LASTA_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_last_general(s, a, false); +} + +static bool trans_LASTB_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_last_general(s, a, true); +} + /* *** SVE Memory - 32-bit Gather and Unsized Contiguous Group */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 9da6566d32..1226867f69 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -430,6 +430,26 @@ TRN2_z 00000101 .. 1 ..... 011 101 ..... ..... @rd_rn_rm # Note esz >= 2 COMPACT 00000101 .. 100001 100 ... ..... ..... @rd_pg_rn +# SVE conditionally broadcast element to vector +CLASTA_z 00000101 .. 10100 0 100 ... ..... ..... @rdn_pg_rm +CLASTB_z 00000101 .. 10100 1 100 ... ..... ..... @rdn_pg_rm + +# SVE conditionally copy element to SIMD&FP scalar +CLASTA_v 00000101 .. 10101 0 100 ... ..... ..... @rd_pg_rn +CLASTB_v 00000101 .. 10101 1 100 ... ..... ..... @rd_pg_rn + +# SVE conditionally copy element to general register +CLASTA_r 00000101 .. 11000 0 101 ... ..... ..... @rd_pg_rn +CLASTB_r 00000101 .. 11000 1 101 ... ..... ..... @rd_pg_rn + +# SVE copy element to SIMD&FP scalar register +LASTA_v 00000101 .. 10001 0 100 ... ..... ..... @rd_pg_rn +LASTB_v 00000101 .. 10001 1 100 ... ..... ..... @rd_pg_rn + +# SVE copy element to general register +LASTA_r 00000101 .. 10000 0 101 ... ..... ..... @rd_pg_rn +LASTB_r 00000101 .. 10000 1 101 ... ..... ..... @rd_pg_rn + ### SVE Predicate Logical Operations Group # SVE predicate logical operations