From patchwork Thu Mar 26 23:08:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 184959 Delivered-To: patch@linaro.org Received: by 2002:a92:de47:0:0:0:0:0 with SMTP id e7csp648736ilr; Thu, 26 Mar 2020 16:22:39 -0700 (PDT) X-Google-Smtp-Source: ADFU+vtg5664Rog9vBvy4gCFxKKMAN8I86UCck5FsJz/xYisOakHmW8J+EPT4i6t1OcMmeqPh3Ed X-Received: by 2002:a37:2fc3:: with SMTP id v186mr11306741qkh.311.1585264959720; Thu, 26 Mar 2020 16:22:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585264959; cv=none; d=google.com; s=arc-20160816; b=g0tt4YyiQtqG6EYsrrGnPqAP3DnI97TTWh4+PTtqGmEEhZHdAvpHbQPsc1tPEEs1Mg yPU6++C1Amt5eqKLsdDR9iZtdGLqoBMq2Nv5cYoiAOVG2yU79ZtENJOJbPdEuCsLma/+ Pg8oNqo6+PygcBuxnBt6/hcGl7iVzCOa5yiBhFiq4NGDSbsyUL4yTK2ek3YAq8ndCVq9 XgvBCHGMmGQmKhF/hbJ9GmuYomqqFDJt/mUsX97NZwLzHH/gvC1HQV7f6/r/PAp12IjZ /8fUWPHjI9sSEdidJAhpEZ86CG1x78+SM+WIhwsvCgvBP+N8Jv1NL4KQTMsyxlz1z5Qp uTZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=iY5JtAHPmUUP1F/+dH6oTIOXqKx5ZoOANRCDeiNVP7k=; b=xz0GzzKL4NyYZN7s4lk2vXU6feWU7OZlOQQbIL/QQKk2jQvHTtRw43JGvV5rK31dZ8 NrMs7ZpZYmhrSmup5/aVhcIq8T3Flelb4Muwp5dIhUYiJw6g6V+BB4PtumCjuENZLbb+ TamuaFEix/CFrahOOR0NIIa++P4MI43GbUPOeQFtJFaWt4OM6RBor8BvQBQn5J0dyUo2 YK9YSX4L+ZqU8RXWTnESU+JseMVhfK/Kj6wZSKU/nFdX3p2asTZbqodSn816JFMpE8gU GKZMeQb1kpzh++jOJn8Ttuu38R+TnaX8+qJs92rc9TXlWl+kXHx/F3vFUY6dYSEXjZgm j/Rw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b="VXIvD9E/"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id f1si2468562qtk.23.2020.03.26.16.22.39 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Mar 2020 16:22:39 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b="VXIvD9E/"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:34862 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jHbpf-0006yO-54 for patch@linaro.org; Thu, 26 Mar 2020 19:22:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59207) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jHbcf-0002w3-AH for qemu-devel@nongnu.org; Thu, 26 Mar 2020 19:09:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jHbcd-0001qr-5u for qemu-devel@nongnu.org; Thu, 26 Mar 2020 19:09:13 -0400 Received: from mail-pg1-x544.google.com ([2607:f8b0:4864:20::544]:44776) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jHbcc-0001pE-U5 for qemu-devel@nongnu.org; Thu, 26 Mar 2020 19:09:11 -0400 Received: by mail-pg1-x544.google.com with SMTP id 142so3639909pgf.11 for ; Thu, 26 Mar 2020 16:09:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=iY5JtAHPmUUP1F/+dH6oTIOXqKx5ZoOANRCDeiNVP7k=; b=VXIvD9E/nYGp0Qf6bb0kKd04aMAuAnzrY8CC6BqEIqDfq6wAMytN6pBNBeuVQezlxq 44pH3y18nnFRAjwmkQEV3/mFBZrf6noH1Q9VcKOiCNa3jFyGVJVZtqcr0DaOGZtO83Zv pkecPkO7mJoLTsQMqgLp5YoP9bCVyY0FFg4EVa52nCf8jEzUmDoj8T2Yc+TPShkMAXP3 m42Yil/NdH1kTIUs7VWe1aY85iISzpjYLV0BuUJfOz1VU1ml3YevOLoUxUXrEpMlmGIP cZ5WYYCjlp5QQriO/WKpS/xOOAmuiIiR1cpfp3AMhoSDBWAu4c/bQUEb+YiHyWdNhNZf +KSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iY5JtAHPmUUP1F/+dH6oTIOXqKx5ZoOANRCDeiNVP7k=; b=TsEZaWluVrkyV8Au7ySd2o6paox4iJbXcCtDj/t87X+rhI0PVoKfgz6ZGlntE2itu2 k32TB3+cGkPSDLfb999IHmMVx6lbiaOFbnHq/SrentMKSPcWqvbCveTFstg0f5hlwrel Hxo7zSbw4+xq637U3ef1ATL8AycahgXiDY9V0+7VIwQ6qnpkhLOjY2swgz6XjsGyp79d PcwifV/LrZDgL5PGjaMiifPdOWywrZ4aBIPvjnBGf9aG6eAJhjZzr2b9oyfTKfDVEiGo h1r4NJEIwpvlGrvodBNHE1bGiWUFEL16HF07ZUgEzK97BdLR7Fwg8vxqZBRfqt5omzf0 BIIA== X-Gm-Message-State: ANhLgQ2wKABaJNe6yL5eIzWjWNELd1pZ7rPXWsaEUYYn9RmjGnOTueUo MkcqPPFFWRDg26uUiYs+mP2i/vOYhPA= X-Received: by 2002:a62:1894:: with SMTP id 142mr11802128pfy.27.1585264149138; Thu, 26 Mar 2020 16:09:09 -0700 (PDT) Received: from localhost.localdomain (174-21-138-234.tukw.qwest.net. [174.21.138.234]) by smtp.gmail.com with ESMTPSA id i187sm2530037pfg.33.2020.03.26.16.09.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Mar 2020 16:09:08 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH 23/31] target/arm: Create arm_gen_gvec_[us]sra Date: Thu, 26 Mar 2020 16:08:30 -0700 Message-Id: <20200326230838.31112-24-richard.henderson@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200326230838.31112-1-richard.henderson@linaro.org> References: <20200326230838.31112-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::544 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rajav@quicinc.com, qemu-arm@nongnu.org, apazos@quicinc.com Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" The functions eliminate duplication of the special cases for this operation. They match up with the GVecGen2iFn typedef. Add out-of-line helpers. We got away with only having inline expanders because the neon vector size is only 16 bytes, and we know that the inline expansion will always succeed. When we reuse this for SVE, tcg-gvec-op may decide to use an out-of-line helper due to longer vector lengths. Signed-off-by: Richard Henderson --- target/arm/helper.h | 10 +++ target/arm/translate.h | 7 +- target/arm/translate-a64.c | 15 +--- target/arm/translate.c | 161 ++++++++++++++++++++++--------------- target/arm/vec_helper.c | 25 ++++++ 5 files changed, 139 insertions(+), 79 deletions(-) -- 2.20.1 diff --git a/target/arm/helper.h b/target/arm/helper.h index 938fdbc362..dc6a43dbd8 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -708,6 +708,16 @@ DEF_HELPER_FLAGS_4(gvec_pmull_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(neon_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_ssra_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_ssra_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_ssra_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_ssra_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_usra_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_usra_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_usra_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_usra_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/translate.h b/target/arm/translate.h index 5552ee5a94..1c5cdf13e3 100644 --- a/target/arm/translate.h +++ b/target/arm/translate.h @@ -291,8 +291,6 @@ extern const GVecGen3 mls_op[4]; extern const GVecGen3 cmtst_op[4]; extern const GVecGen3 sshl_op[4]; extern const GVecGen3 ushl_op[4]; -extern const GVecGen2i ssra_op[4]; -extern const GVecGen2i usra_op[4]; extern const GVecGen2i sri_op[4]; extern const GVecGen2i sli_op[4]; extern const GVecGen4 uqadd_op[4]; @@ -305,6 +303,11 @@ void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b); void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void arm_gen_gvec_ssra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, + int64_t shift, uint32_t opr_sz, uint32_t max_sz); +void arm_gen_gvec_usra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, + int64_t shift, uint32_t opr_sz, uint32_t max_sz); + /* * Forward to the isar_feature_* tests given a DisasContext pointer. */ diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 2bcf643069..d50207fcfb 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -10682,19 +10682,8 @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u, switch (opcode) { case 0x02: /* SSRA / USRA (accumulate) */ - if (is_u) { - /* Shift count same as element size produces zero to add. */ - if (shift == 8 << size) { - goto done; - } - gen_gvec_op2i(s, is_q, rd, rn, shift, &usra_op[size]); - } else { - /* Shift count same as element size produces all sign to add. */ - if (shift == 8 << size) { - shift -= 1; - } - gen_gvec_op2i(s, is_q, rd, rn, shift, &ssra_op[size]); - } + gen_gvec_fn2i(s, is_q, rd, rn, shift, + is_u ? arm_gen_gvec_usra : arm_gen_gvec_ssra, size); return; case 0x08: /* SRI */ /* Shift count same as element size is valid but does nothing. */ diff --git a/target/arm/translate.c b/target/arm/translate.c index cba84987db..f5768014d1 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -3947,33 +3947,51 @@ static void gen_ssra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh) tcg_gen_add_vec(vece, d, d, a); } -static const TCGOpcode vecop_list_ssra[] = { - INDEX_op_sari_vec, INDEX_op_add_vec, 0 -}; +void arm_gen_gvec_ssra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, + int64_t shift, uint32_t opr_sz, uint32_t max_sz) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_sari_vec, INDEX_op_add_vec, 0 + }; + static const GVecGen2i ops[4] = { + { .fni8 = gen_ssra8_i64, + .fniv = gen_ssra_vec, + .fno = gen_helper_gvec_ssra_b, + .load_dest = true, + .opt_opc = vecop_list, + .vece = MO_8 }, + { .fni8 = gen_ssra16_i64, + .fniv = gen_ssra_vec, + .fno = gen_helper_gvec_ssra_h, + .load_dest = true, + .opt_opc = vecop_list, + .vece = MO_16 }, + { .fni4 = gen_ssra32_i32, + .fniv = gen_ssra_vec, + .fno = gen_helper_gvec_ssra_s, + .load_dest = true, + .opt_opc = vecop_list, + .vece = MO_32 }, + { .fni8 = gen_ssra64_i64, + .fniv = gen_ssra_vec, + .fno = gen_helper_gvec_ssra_b, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .opt_opc = vecop_list, + .load_dest = true, + .vece = MO_64 }, + }; -const GVecGen2i ssra_op[4] = { - { .fni8 = gen_ssra8_i64, - .fniv = gen_ssra_vec, - .load_dest = true, - .opt_opc = vecop_list_ssra, - .vece = MO_8 }, - { .fni8 = gen_ssra16_i64, - .fniv = gen_ssra_vec, - .load_dest = true, - .opt_opc = vecop_list_ssra, - .vece = MO_16 }, - { .fni4 = gen_ssra32_i32, - .fniv = gen_ssra_vec, - .load_dest = true, - .opt_opc = vecop_list_ssra, - .vece = MO_32 }, - { .fni8 = gen_ssra64_i64, - .fniv = gen_ssra_vec, - .prefer_i64 = TCG_TARGET_REG_BITS == 64, - .opt_opc = vecop_list_ssra, - .load_dest = true, - .vece = MO_64 }, -}; + /* tszimm encoding produces immediates in the range [1..esize]. */ + tcg_debug_assert(shift > 0); + tcg_debug_assert(shift <= (8 << vece)); + + /* + * Shifts larger than the element size are architecturally valid. + * Signed results in all sign bits. + */ + shift = MIN(shift, (8 << vece) - 1); + tcg_gen_gvec_2i(rd_ofs, rm_ofs, opr_sz, max_sz, shift, &ops[vece]); +} static void gen_usra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) { @@ -4005,33 +4023,55 @@ static void gen_usra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh) tcg_gen_add_vec(vece, d, d, a); } -static const TCGOpcode vecop_list_usra[] = { - INDEX_op_shri_vec, INDEX_op_add_vec, 0 -}; +void arm_gen_gvec_usra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, + int64_t shift, uint32_t opr_sz, uint32_t max_sz) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_shri_vec, INDEX_op_add_vec, 0 + }; + static const GVecGen2i ops[4] = { + { .fni8 = gen_usra8_i64, + .fniv = gen_usra_vec, + .fno = gen_helper_gvec_usra_b, + .load_dest = true, + .opt_opc = vecop_list, + .vece = MO_8, }, + { .fni8 = gen_usra16_i64, + .fniv = gen_usra_vec, + .fno = gen_helper_gvec_usra_h, + .load_dest = true, + .opt_opc = vecop_list, + .vece = MO_16, }, + { .fni4 = gen_usra32_i32, + .fniv = gen_usra_vec, + .fno = gen_helper_gvec_usra_s, + .load_dest = true, + .opt_opc = vecop_list, + .vece = MO_32, }, + { .fni8 = gen_usra64_i64, + .fniv = gen_usra_vec, + .fno = gen_helper_gvec_usra_d, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .opt_opc = vecop_list, + .vece = MO_64, }, + }; -const GVecGen2i usra_op[4] = { - { .fni8 = gen_usra8_i64, - .fniv = gen_usra_vec, - .load_dest = true, - .opt_opc = vecop_list_usra, - .vece = MO_8, }, - { .fni8 = gen_usra16_i64, - .fniv = gen_usra_vec, - .load_dest = true, - .opt_opc = vecop_list_usra, - .vece = MO_16, }, - { .fni4 = gen_usra32_i32, - .fniv = gen_usra_vec, - .load_dest = true, - .opt_opc = vecop_list_usra, - .vece = MO_32, }, - { .fni8 = gen_usra64_i64, - .fniv = gen_usra_vec, - .prefer_i64 = TCG_TARGET_REG_BITS == 64, - .load_dest = true, - .opt_opc = vecop_list_usra, - .vece = MO_64, }, -}; + /* tszimm encoding produces immediates in the range [1..esize]. */ + tcg_debug_assert(shift > 0); + tcg_debug_assert(shift <= (8 << vece)); + + /* + * Shifts larger than the element size are architecturally valid. + * Unsigned results in all zeros as input to accumulate: nop. + */ + if (shift < (8 << vece)) { + tcg_gen_gvec_2i(rd_ofs, rm_ofs, opr_sz, max_sz, shift, &ops[vece]); + } else { + /* Nop, but we do need to clear the tail. */ + tcg_gen_gvec_mov(vece, rd_ofs, rd_ofs, opr_sz, max_sz); + } +} static void gen_shr8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) { @@ -5396,19 +5436,12 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) case 1: /* VSRA */ /* Right shift comes here negative. */ shift = -shift; - /* Shifts larger than the element size are architecturally - * valid. Unsigned results in all zeros; signed results - * in all sign bits. - */ - if (!u) { - tcg_gen_gvec_2i(rd_ofs, rm_ofs, vec_size, vec_size, - MIN(shift, (8 << size) - 1), - &ssra_op[size]); - } else if (shift >= 8 << size) { - /* rd += 0 */ + if (u) { + arm_gen_gvec_usra(size, rd_ofs, rm_ofs, shift, + vec_size, vec_size); } else { - tcg_gen_gvec_2i(rd_ofs, rm_ofs, vec_size, vec_size, - shift, &usra_op[size]); + arm_gen_gvec_ssra(size, rd_ofs, rm_ofs, shift, + vec_size, vec_size); } return 0; diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 154d32518a..aaaccc0a2d 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -899,6 +899,31 @@ void HELPER(gvec_sqsub_d)(void *vd, void *vq, void *vn, clear_tail(d, oprsz, simd_maxsz(desc)); } + +#define DO_SRA(NAME, TYPE) \ +void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + int shift = simd_data(desc); \ + TYPE *d = vd, *n = vn; \ + for (i = 0; i < oprsz / sizeof(TYPE); i++) { \ + d[i] += n[i] >> shift; \ + } \ + clear_tail(d, oprsz, simd_maxsz(desc)); \ +} + +DO_SRA(gvec_ssra_b, int8_t) +DO_SRA(gvec_ssra_h, int16_t) +DO_SRA(gvec_ssra_s, int32_t) +DO_SRA(gvec_ssra_d, int64_t) + +DO_SRA(gvec_usra_b, uint8_t) +DO_SRA(gvec_usra_h, uint16_t) +DO_SRA(gvec_usra_s, uint32_t) +DO_SRA(gvec_usra_d, uint64_t) + +#undef DO_SRA + /* * Convert float16 to float32, raising no exceptions and * preserving exceptional values, including SNaN.