From patchwork Sun Feb 16 21:42:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 183568 Delivered-To: patch@linaro.org Received: by 2002:a92:1f12:0:0:0:0:0 with SMTP id i18csp4250141ile; Sun, 16 Feb 2020 13:44:23 -0800 (PST) X-Google-Smtp-Source: APXvYqzLOU4kkgTUe1Xz28/AAZy1sOvsfyWfvi/PgeAmodfPXhQZbOGUFKnnE4GUSbrbmswKkZrB X-Received: by 2002:a05:620a:b10:: with SMTP id t16mr11677846qkg.399.1581889463494; Sun, 16 Feb 2020 13:44:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581889463; cv=none; d=google.com; s=arc-20160816; b=QQg/9xk9eqmkXGOD0JQ75aBaY4N4+lE0Ydqf0VqKlMG7TuhDICdMTpeQCtAOY9oXqg 50mcTaDOWVD1SrrZik+rl9CEkgJL/ef7GZF63WOmg3e6XpGe9I3qjm/M1JvtD9quf5Ia zTh5SaY+KS0OyJ7rX7N7QBH0qtGY1297BneHB2l0aBZD+dZzM9ETQuqmXNf69ZUFUrgr qS/Btu9gfPvo58niKJxVAO/Y5XOgNOU9olsPOtxJqQcm1Mzu9S9gMR3SulFuw8J/MFKo Vh6SiH2UzIltq+3wVg60iEMSs76WW2KfKL78daY7RFBFv9gzZ3Uq66/ZysRrte1CT2su 1r0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=RMvDT5IF7+SG6qKveS2+MLydkPKMSrNYy/9YPepYNQA=; b=d9MWCL7V4iQUgzGS+1YF9/SjwGTdZj48UFo4z64WRnlHyZj9IEO2+kcKxPI8/3LIiA T/ttHNBVv0UnfwClUqXiDVZe6ocIFNYP6yuX+GfZDeeTQ/FDtOcIozzgUJ++YU4zA6jl x3DiKOsLLIpn+eA8TDufabgxkTzlvpgGtK3oZjChC5p+dzQUF9UGYkHE9paR3XRk1IAp 9TnuaD+F7GbUgNzJYH/ZsKvyQTZWLhyJKn4z4E5nKhhhLa+QWjx8E2vy1J9oRe+DgFd4 PWAwAqn/mT8LExBGFPRXuIGECJ33W0zwBQb/0QxvLB30reOyGUGYQIoPJi2J5+03sSk2 XUnw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XgvxuhaX; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id c136si6414005qkb.250.2020.02.16.13.44.23 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 16 Feb 2020 13:44:23 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XgvxuhaX; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:36922 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3RiA-0001ii-VV for patch@linaro.org; Sun, 16 Feb 2020 16:44:22 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:47731) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3RgW-0007sd-K6 for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j3RgT-0004tP-Lh for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:40 -0500 Received: from mail-pj1-x1029.google.com ([2607:f8b0:4864:20::1029]:51600) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j3RgT-0004rY-6Z for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:37 -0500 Received: by mail-pj1-x1029.google.com with SMTP id fa20so6258208pjb.1 for ; Sun, 16 Feb 2020 13:42:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RMvDT5IF7+SG6qKveS2+MLydkPKMSrNYy/9YPepYNQA=; b=XgvxuhaXaZXNRJ+P+2UfUs2QiFsX6Yaq+1tij/2djHA3fFqto8CZZ/GI+ZEPMPfXeP FEbb3YepM4agDwteqk3osppFvPbIj8nA69gPUWD4IIa/NgAqjsZ4ryf2iVD+C4ZOwBby CY4jtfX2qsXBzxclB2lLl0Gk/yT0vL+fGWiYwmuJKTfrSiFCGT6rbYQrId6pbbce+wrI UTL91qtKQi/9gxq4DhHFndd9uiRnElPvDCQvRmZKaVHDBIAMBciPHxhaCiI7uxONk5PT K8aonmc1uoFxfUsY35l8h6aA5VvXCimYZRTc7NA2y7DZuaOojuo9wccMzYXXZd5IF1/T 7/WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RMvDT5IF7+SG6qKveS2+MLydkPKMSrNYy/9YPepYNQA=; b=ty/vnfsioQbo4p6vkHV4V4vMfNE2jdfxyUBha4xefcqLjSchD3ntyWY57H42WYkeqJ Rs6OusThrTFuNAkI8UrWHmKjJK+d9fSR1vPAQIH8JhPvzdK3F8bUl46DiiIhkTpbYFWr +Af4cuLIKbvdvkHiBA8MHHkS/jPhoW7Fg6g4nzuQbtzGbYZ6lEWgDZ7cuC+KpU7YXFd8 qh37vW8uZT2hz4mwV9Z+d4VPONjzaIaRK7/liN/r3bczKHhTkd20YRuybzMVVBvWA6cz aXW6NkNUqcGlV200B7YxGo7rrJj2FgRwUemxx4eVp15KeHsK9XDeCR/OpFSjhEdbMP55 CUOA== X-Gm-Message-State: APjAAAVWUVM2+eQyXsCAKHNH5W6u/flEwS379W0Z++u7di4HeBe0Pk/j ycHRsp6CCE45LBfyTU45ddVtMqq7X6A= X-Received: by 2002:a17:90a:fe8:: with SMTP id 95mr16111785pjz.98.1581889354818; Sun, 16 Feb 2020 13:42:34 -0800 (PST) Received: from localhost.localdomain (97-126-123-70.tukw.qwest.net. [97.126.123.70]) by smtp.gmail.com with ESMTPSA id z29sm14821848pgc.21.2020.02.16.13.42.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Feb 2020 13:42:34 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v2 1/4] target/arm: Vectorize USHL and SSHL Date: Sun, 16 Feb 2020 13:42:29 -0800 Message-Id: <20200216214232.4230-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200216214232.4230-1-richard.henderson@linaro.org> References: <20200216214232.4230-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::1029 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org, =?utf-8?q?Alex_Benn?= =?utf-8?b?w6ll?= Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" These instructions shift left or right depending on the sign of the input, and 7 bits are significant to the shift. This requires several masks and selects in addition to the actual shifts to form the complete answer. That said, the operation is still a small improvement even for two 64-bit elements -- 13 vector operations instead of 2 * 7 integer operations. Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- v2: Fix operand ordering for aa32 VSHL. v3: Rename operand for inline tcg expanders (ajb). --- target/arm/helper.h | 11 +- target/arm/translate.h | 6 + target/arm/neon_helper.c | 33 ---- target/arm/translate-a64.c | 18 +-- target/arm/translate.c | 299 +++++++++++++++++++++++++++++++++++-- target/arm/vec_helper.c | 88 +++++++++++ 6 files changed, 389 insertions(+), 66 deletions(-) -- 2.20.1 diff --git a/target/arm/helper.h b/target/arm/helper.h index aa3d8cd08f..459a278b5c 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -303,14 +303,8 @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32) DEF_HELPER_2(neon_abd_u32, i32, i32, i32) DEF_HELPER_2(neon_abd_s32, i32, i32, i32) -DEF_HELPER_2(neon_shl_u8, i32, i32, i32) -DEF_HELPER_2(neon_shl_s8, i32, i32, i32) DEF_HELPER_2(neon_shl_u16, i32, i32, i32) DEF_HELPER_2(neon_shl_s16, i32, i32, i32) -DEF_HELPER_2(neon_shl_u32, i32, i32, i32) -DEF_HELPER_2(neon_shl_s32, i32, i32, i32) -DEF_HELPER_2(neon_shl_u64, i64, i64, i64) -DEF_HELPER_2(neon_shl_s64, i64, i64, i64) DEF_HELPER_2(neon_rshl_u8, i32, i32, i32) DEF_HELPER_2(neon_rshl_s8, i32, i32, i32) DEF_HELPER_2(neon_rshl_u16, i32, i32, i32) @@ -697,6 +691,11 @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, ptr) DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr) DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr) +DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/translate.h b/target/arm/translate.h index 5b167c416a..d9ea0c99cc 100644 --- a/target/arm/translate.h +++ b/target/arm/translate.h @@ -278,6 +278,8 @@ uint64_t vfp_expand_imm(int size, uint8_t imm8); extern const GVecGen3 mla_op[4]; extern const GVecGen3 mls_op[4]; extern const GVecGen3 cmtst_op[4]; +extern const GVecGen3 sshl_op[4]; +extern const GVecGen3 ushl_op[4]; extern const GVecGen2i ssra_op[4]; extern const GVecGen2i usra_op[4]; extern const GVecGen2i sri_op[4]; @@ -287,6 +289,10 @@ extern const GVecGen4 sqadd_op[4]; extern const GVecGen4 uqsub_op[4]; extern const GVecGen4 sqsub_op[4]; void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b); +void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b); +void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); /* * Forward to the isar_feature_* tests given a DisasContext pointer. diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c index 4259056723..c581ffb7d3 100644 --- a/target/arm/neon_helper.c +++ b/target/arm/neon_helper.c @@ -615,24 +615,9 @@ NEON_VOP(abd_u32, neon_u32, 1) } else { \ dest = src1 << tmp; \ }} while (0) -NEON_VOP(shl_u8, neon_u8, 4) NEON_VOP(shl_u16, neon_u16, 2) -NEON_VOP(shl_u32, neon_u32, 1) #undef NEON_FN -uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop) -{ - int8_t shift = (int8_t)shiftop; - if (shift >= 64 || shift <= -64) { - val = 0; - } else if (shift < 0) { - val >>= -shift; - } else { - val <<= shift; - } - return val; -} - #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ @@ -645,27 +630,9 @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop) } else { \ dest = src1 << tmp; \ }} while (0) -NEON_VOP(shl_s8, neon_s8, 4) NEON_VOP(shl_s16, neon_s16, 2) -NEON_VOP(shl_s32, neon_s32, 1) #undef NEON_FN -uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop) -{ - int8_t shift = (int8_t)shiftop; - int64_t val = valop; - if (shift >= 64) { - val = 0; - } else if (shift <= -64) { - val >>= 63; - } else if (shift < 0) { - val >>= -shift; - } else { - val <<= shift; - } - return val; -} - #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 7c26c3bfeb..e42dcfebdd 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -8735,9 +8735,9 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u, break; case 0x8: /* SSHL, USHL */ if (u) { - gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm); + gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm); } else { - gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm); + gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm); } break; case 0x9: /* SQSHL, UQSHL */ @@ -11132,6 +11132,10 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) is_q ? 16 : 8, vec_full_reg_size(s), (u ? uqsub_op : sqsub_op) + size); return; + case 0x08: /* SSHL, USHL */ + gen_gvec_op3(s, is_q, rd, rn, rm, + u ? &ushl_op[size] : &sshl_op[size]); + return; case 0x0c: /* SMAX, UMAX */ if (u) { gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size); @@ -11247,16 +11251,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn = fns[size][u]; break; } - case 0x8: /* SSHL, USHL */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 }, - { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 }, - { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 }, - }; - genfn = fns[size][u]; - break; - } case 0x9: /* SQSHL, UQSHL */ { static NeonGenTwoOpEnvFn * const fns[3][2] = { diff --git a/target/arm/translate.c b/target/arm/translate.c index 20f89ace2f..c4dd14e053 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -3575,13 +3575,13 @@ static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift, if (u) { switch (size) { case 1: gen_helper_neon_shl_u16(var, var, shift); break; - case 2: gen_helper_neon_shl_u32(var, var, shift); break; + case 2: gen_ushl_i32(var, var, shift); break; default: abort(); } } else { switch (size) { case 1: gen_helper_neon_shl_s16(var, var, shift); break; - case 2: gen_helper_neon_shl_s32(var, var, shift); break; + case 2: gen_sshl_i32(var, var, shift); break; default: abort(); } } @@ -4384,6 +4384,280 @@ const GVecGen3 cmtst_op[4] = { .vece = MO_64 }, }; +void gen_ushl_i32(TCGv_i32 dst, TCGv_i32 src, TCGv_i32 shift) +{ + TCGv_i32 lval = tcg_temp_new_i32(); + TCGv_i32 rval = tcg_temp_new_i32(); + TCGv_i32 lsh = tcg_temp_new_i32(); + TCGv_i32 rsh = tcg_temp_new_i32(); + TCGv_i32 zero = tcg_const_i32(0); + TCGv_i32 max = tcg_const_i32(32); + + /* + * Rely on the TCG guarantee that out of range shifts produce + * unspecified results, not undefined behaviour (i.e. no trap). + * Discard out-of-range results after the fact. + */ + tcg_gen_ext8s_i32(lsh, shift); + tcg_gen_neg_i32(rsh, lsh); + tcg_gen_shl_i32(lval, src, lsh); + tcg_gen_shr_i32(rval, src, rsh); + tcg_gen_movcond_i32(TCG_COND_LTU, dst, lsh, max, lval, zero); + tcg_gen_movcond_i32(TCG_COND_LTU, dst, rsh, max, rval, dst); + + tcg_temp_free_i32(lval); + tcg_temp_free_i32(rval); + tcg_temp_free_i32(lsh); + tcg_temp_free_i32(rsh); + tcg_temp_free_i32(zero); + tcg_temp_free_i32(max); +} + +void gen_ushl_i64(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 shift) +{ + TCGv_i64 lval = tcg_temp_new_i64(); + TCGv_i64 rval = tcg_temp_new_i64(); + TCGv_i64 lsh = tcg_temp_new_i64(); + TCGv_i64 rsh = tcg_temp_new_i64(); + TCGv_i64 zero = tcg_const_i64(0); + TCGv_i64 max = tcg_const_i64(64); + + /* + * Rely on the TCG guarantee that out of range shifts produce + * unspecified results, not undefined behaviour (i.e. no trap). + * Discard out-of-range results after the fact. + */ + tcg_gen_ext8s_i64(lsh, shift); + tcg_gen_neg_i64(rsh, lsh); + tcg_gen_shl_i64(lval, src, lsh); + tcg_gen_shr_i64(rval, src, rsh); + tcg_gen_movcond_i64(TCG_COND_LTU, dst, lsh, max, lval, zero); + tcg_gen_movcond_i64(TCG_COND_LTU, dst, rsh, max, rval, dst); + + tcg_temp_free_i64(lval); + tcg_temp_free_i64(rval); + tcg_temp_free_i64(lsh); + tcg_temp_free_i64(rsh); + tcg_temp_free_i64(zero); + tcg_temp_free_i64(max); +} + +static void gen_ushl_vec(unsigned vece, TCGv_vec dst, + TCGv_vec src, TCGv_vec shift) +{ + TCGv_vec lval = tcg_temp_new_vec_matching(dst); + TCGv_vec rval = tcg_temp_new_vec_matching(dst); + TCGv_vec lsh = tcg_temp_new_vec_matching(dst); + TCGv_vec rsh = tcg_temp_new_vec_matching(dst); + TCGv_vec msk, max; + + tcg_gen_neg_vec(vece, rsh, shift); + if (vece == MO_8) { + tcg_gen_mov_vec(lsh, shift); + } else { + msk = tcg_temp_new_vec_matching(dst); + tcg_gen_dupi_vec(vece, msk, 0xff); + tcg_gen_and_vec(vece, lsh, shift, msk); + tcg_gen_and_vec(vece, rsh, rsh, msk); + tcg_temp_free_vec(msk); + } + + /* + * Rely on the TCG guarantee that out of range shifts produce + * unspecified results, not undefined behaviour (i.e. no trap). + * Discard out-of-range results after the fact. + */ + tcg_gen_shlv_vec(vece, lval, src, lsh); + tcg_gen_shrv_vec(vece, rval, src, rsh); + + max = tcg_temp_new_vec_matching(dst); + tcg_gen_dupi_vec(vece, max, 8 << vece); + + /* + * The choice of LT (signed) and GEU (unsigned) are biased toward + * the instructions of the x86_64 host. For MO_8, the whole byte + * is significant so we must use an unsigned compare; otherwise we + * have already masked to a byte and so a signed compare works. + * Other tcg hosts have a full set of comparisons and do not care. + */ + if (vece == MO_8) { + tcg_gen_cmp_vec(TCG_COND_GEU, vece, lsh, lsh, max); + tcg_gen_cmp_vec(TCG_COND_GEU, vece, rsh, rsh, max); + tcg_gen_andc_vec(vece, lval, lval, lsh); + tcg_gen_andc_vec(vece, rval, rval, rsh); + } else { + tcg_gen_cmp_vec(TCG_COND_LT, vece, lsh, lsh, max); + tcg_gen_cmp_vec(TCG_COND_LT, vece, rsh, rsh, max); + tcg_gen_and_vec(vece, lval, lval, lsh); + tcg_gen_and_vec(vece, rval, rval, rsh); + } + tcg_gen_or_vec(vece, dst, lval, rval); + + tcg_temp_free_vec(max); + tcg_temp_free_vec(lval); + tcg_temp_free_vec(rval); + tcg_temp_free_vec(lsh); + tcg_temp_free_vec(rsh); +} + +static const TCGOpcode ushl_list[] = { + INDEX_op_neg_vec, INDEX_op_shlv_vec, + INDEX_op_shrv_vec, INDEX_op_cmp_vec, 0 +}; + +const GVecGen3 ushl_op[4] = { + { .fniv = gen_ushl_vec, + .fno = gen_helper_gvec_ushl_b, + .opt_opc = ushl_list, + .vece = MO_8 }, + { .fniv = gen_ushl_vec, + .fno = gen_helper_gvec_ushl_h, + .opt_opc = ushl_list, + .vece = MO_16 }, + { .fni4 = gen_ushl_i32, + .fniv = gen_ushl_vec, + .opt_opc = ushl_list, + .vece = MO_32 }, + { .fni8 = gen_ushl_i64, + .fniv = gen_ushl_vec, + .opt_opc = ushl_list, + .vece = MO_64 }, +}; + +void gen_sshl_i32(TCGv_i32 dst, TCGv_i32 src, TCGv_i32 shift) +{ + TCGv_i32 lval = tcg_temp_new_i32(); + TCGv_i32 rval = tcg_temp_new_i32(); + TCGv_i32 lsh = tcg_temp_new_i32(); + TCGv_i32 rsh = tcg_temp_new_i32(); + TCGv_i32 zero = tcg_const_i32(0); + TCGv_i32 max = tcg_const_i32(31); + + /* + * Rely on the TCG guarantee that out of range shifts produce + * unspecified results, not undefined behaviour (i.e. no trap). + * Discard out-of-range results after the fact. + */ + tcg_gen_ext8s_i32(lsh, shift); + tcg_gen_neg_i32(rsh, lsh); + tcg_gen_shl_i32(lval, src, lsh); + tcg_gen_umin_i32(rsh, rsh, max); + tcg_gen_sar_i32(rval, src, rsh); + tcg_gen_movcond_i32(TCG_COND_LEU, lval, lsh, max, lval, zero); + tcg_gen_movcond_i32(TCG_COND_LT, dst, lsh, zero, rval, lval); + + tcg_temp_free_i32(lval); + tcg_temp_free_i32(rval); + tcg_temp_free_i32(lsh); + tcg_temp_free_i32(rsh); + tcg_temp_free_i32(zero); + tcg_temp_free_i32(max); +} + +void gen_sshl_i64(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 shift) +{ + TCGv_i64 lval = tcg_temp_new_i64(); + TCGv_i64 rval = tcg_temp_new_i64(); + TCGv_i64 lsh = tcg_temp_new_i64(); + TCGv_i64 rsh = tcg_temp_new_i64(); + TCGv_i64 zero = tcg_const_i64(0); + TCGv_i64 max = tcg_const_i64(63); + + /* + * Rely on the TCG guarantee that out of range shifts produce + * unspecified results, not undefined behaviour (i.e. no trap). + * Discard out-of-range results after the fact. + */ + tcg_gen_ext8s_i64(lsh, shift); + tcg_gen_neg_i64(rsh, lsh); + tcg_gen_shl_i64(lval, src, lsh); + tcg_gen_umin_i64(rsh, rsh, max); + tcg_gen_sar_i64(rval, src, rsh); + tcg_gen_movcond_i64(TCG_COND_LEU, lval, lsh, max, lval, zero); + tcg_gen_movcond_i64(TCG_COND_LT, dst, lsh, zero, rval, lval); + + tcg_temp_free_i64(lval); + tcg_temp_free_i64(rval); + tcg_temp_free_i64(lsh); + tcg_temp_free_i64(rsh); + tcg_temp_free_i64(zero); + tcg_temp_free_i64(max); +} + +static void gen_sshl_vec(unsigned vece, TCGv_vec dst, + TCGv_vec src, TCGv_vec shift) +{ + TCGv_vec lval = tcg_temp_new_vec_matching(dst); + TCGv_vec rval = tcg_temp_new_vec_matching(dst); + TCGv_vec lsh = tcg_temp_new_vec_matching(dst); + TCGv_vec rsh = tcg_temp_new_vec_matching(dst); + TCGv_vec tmp = tcg_temp_new_vec_matching(dst); + + /* + * Rely on the TCG guarantee that out of range shifts produce + * unspecified results, not undefined behaviour (i.e. no trap). + * Discard out-of-range results after the fact. + */ + tcg_gen_neg_vec(vece, rsh, shift); + if (vece == MO_8) { + tcg_gen_mov_vec(lsh, shift); + } else { + tcg_gen_dupi_vec(vece, tmp, 0xff); + tcg_gen_and_vec(vece, lsh, shift, tmp); + tcg_gen_and_vec(vece, rsh, rsh, tmp); + } + + /* Bound rsh so out of bound right shift gets -1. */ + tcg_gen_dupi_vec(vece, tmp, (8 << vece) - 1); + tcg_gen_umin_vec(vece, rsh, rsh, tmp); + tcg_gen_cmp_vec(TCG_COND_GT, vece, tmp, lsh, tmp); + + tcg_gen_shlv_vec(vece, lval, src, lsh); + tcg_gen_sarv_vec(vece, rval, src, rsh); + + /* Select in-bound left shift. */ + tcg_gen_andc_vec(vece, lval, lval, tmp); + + /* Select between left and right shift. */ + if (vece == MO_8) { + tcg_gen_dupi_vec(vece, tmp, 0); + tcg_gen_cmpsel_vec(TCG_COND_LT, vece, dst, lsh, tmp, rval, lval); + } else { + tcg_gen_dupi_vec(vece, tmp, 0x80); + tcg_gen_cmpsel_vec(TCG_COND_LT, vece, dst, lsh, tmp, lval, rval); + } + + tcg_temp_free_vec(lval); + tcg_temp_free_vec(rval); + tcg_temp_free_vec(lsh); + tcg_temp_free_vec(rsh); + tcg_temp_free_vec(tmp); +} + +static const TCGOpcode sshl_list[] = { + INDEX_op_neg_vec, INDEX_op_umin_vec, INDEX_op_shlv_vec, + INDEX_op_sarv_vec, INDEX_op_cmp_vec, INDEX_op_cmpsel_vec, 0 +}; + +const GVecGen3 sshl_op[4] = { + { .fniv = gen_sshl_vec, + .fno = gen_helper_gvec_sshl_b, + .opt_opc = sshl_list, + .vece = MO_8 }, + { .fniv = gen_sshl_vec, + .fno = gen_helper_gvec_sshl_h, + .opt_opc = sshl_list, + .vece = MO_16 }, + { .fni4 = gen_sshl_i32, + .fniv = gen_sshl_vec, + .opt_opc = sshl_list, + .vece = MO_32 }, + { .fni8 = gen_sshl_i64, + .fniv = gen_sshl_vec, + .opt_opc = sshl_list, + .vece = MO_64 }, +}; + static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat, TCGv_vec a, TCGv_vec b) { @@ -4787,6 +5061,12 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) vec_size, vec_size); } return 0; + + case NEON_3R_VSHL: + /* Note the operation is vshl vd,vm,vn */ + tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size, + u ? &ushl_op[size] : &sshl_op[size]); + return 0; } if (size == 3) { @@ -4795,13 +5075,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) neon_load_reg64(cpu_V0, rn + pass); neon_load_reg64(cpu_V1, rm + pass); switch (op) { - case NEON_3R_VSHL: - if (u) { - gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0); - } else { - gen_helper_neon_shl_s64(cpu_V0, cpu_V1, cpu_V0); - } - break; case NEON_3R_VQSHL: if (u) { gen_helper_neon_qshl_u64(cpu_V0, cpu_env, @@ -4836,7 +5109,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) } pairwise = 0; switch (op) { - case NEON_3R_VSHL: case NEON_3R_VQSHL: case NEON_3R_VRSHL: case NEON_3R_VQRSHL: @@ -4916,9 +5188,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) case NEON_3R_VHSUB: GEN_NEON_INTEGER_OP(hsub); break; - case NEON_3R_VSHL: - GEN_NEON_INTEGER_OP(shl); - break; case NEON_3R_VQSHL: GEN_NEON_INTEGER_OP_ENV(qshl); break; @@ -5327,9 +5596,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) } } else { if (input_unsigned) { - gen_helper_neon_shl_u64(cpu_V0, in, tmp64); + gen_ushl_i64(cpu_V0, in, tmp64); } else { - gen_helper_neon_shl_s64(cpu_V0, in, tmp64); + gen_sshl_i64(cpu_V0, in, tmp64); } } tmp = tcg_temp_new_i32(); diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index dedef62403..fcb3663903 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -1046,3 +1046,91 @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm, do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc, get_flush_inputs_to_zero(&env->vfp.fp_status_f16)); } + +void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + int8_t *d = vd, *n = vn, *m = vm; + + for (i = 0; i < opr_sz; ++i) { + int8_t mm = m[i]; + int8_t nn = n[i]; + int8_t res = 0; + if (mm >= 0) { + if (mm < 8) { + res = nn << mm; + } + } else { + res = nn >> (mm > -8 ? -mm : 7); + } + d[i] = res; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_sshl_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + int16_t *d = vd, *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 2; ++i) { + int8_t mm = m[i]; /* only 8 bits of shift are significant */ + int16_t nn = n[i]; + int16_t res = 0; + if (mm >= 0) { + if (mm < 16) { + res = nn << mm; + } + } else { + res = nn >> (mm > -16 ? -mm : 15); + } + d[i] = res; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_ushl_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint8_t *d = vd, *n = vn, *m = vm; + + for (i = 0; i < opr_sz; ++i) { + int8_t mm = m[i]; + uint8_t nn = n[i]; + uint8_t res = 0; + if (mm >= 0) { + if (mm < 8) { + res = nn << mm; + } + } else { + if (mm > -8) { + res = nn >> -mm; + } + } + d[i] = res; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint16_t *d = vd, *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 2; ++i) { + int8_t mm = m[i]; /* only 8 bits of shift are significant */ + uint16_t nn = n[i]; + uint16_t res = 0; + if (mm >= 0) { + if (mm < 16) { + res = nn << mm; + } + } else { + if (mm > -16) { + res = nn >> -mm; + } + } + d[i] = res; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} From patchwork Sun Feb 16 21:42:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 183566 Delivered-To: patch@linaro.org Received: by 2002:a92:1f12:0:0:0:0:0 with SMTP id i18csp4249132ile; Sun, 16 Feb 2020 13:42:53 -0800 (PST) X-Google-Smtp-Source: APXvYqyty2O6EWXL2gC6dmBMtZDlSI+bmnrXc27hI6FcHYi5nDcm2HpZj78tCgndnHJaF1udCwnO X-Received: by 2002:ac8:349d:: with SMTP id w29mr11090403qtb.386.1581889372901; Sun, 16 Feb 2020 13:42:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581889372; cv=none; d=google.com; s=arc-20160816; b=D7Hbg6V5vI1D2LqT72wcMHQNK2O9GPeQoCnr0xqKN1kPcne+o/wepzubvjiFNIJm7j YwF/v2VLoRAmnTXFW+PZn6lCvvUw6KRNRMTBcOntXFZ9s8TfINiR1SXb50ZpbXIjPsVy xlBzpqCZQvayMPOP7C+d+tU5wacldc7F2XB5L+7AaBCtyAQZmEceIQQmIft3yH1mY4jN OHa6vw2GYMVLjTh98dPce81aXgEq/vfOxGeGExaopT+eMW3nUIgYtI55DVAbUuyXxG+K a+XWUUiCMdMG68NV3JAc2qp0kDPRNHOs2wX89lRvd0BUt0OIWVyFsJxnivduhaz8NRq7 i7lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=O2GXPDLFXihWuaiW0fDwrNaCDa99hsJ9UfIJrAM1orM=; b=TZPbIGQdvRcZibQ8wpSlN+Be1bQk7tmRIOFZMwOC3VDSsfmhhCHzoeewfRmiy2K3Af 2vMjs2JmHTXTdf4ZaxI33QaRg9BphMvDAFllWQxnAksgor9zbgQDm5A+5ILicJNtsBZf zS+aRAEuwDf2eXM6bd/ckKN4DdGiI8mjezJ9md6NMm0HiQG0UnWoTuvYS6XHkPUWtnwj FAv9JIsdpIBLw87/czUBqJBSGhZDZvIAcJ6M+LZ9QsL/sFvu6QUR2ZxIydyk9q0lfT5N loGYuUXcgUz6Q6/MjHewHX+R/HGSkaSTF5n0c1asCppaO+su4ZI3wu/8D9R/jkyMHVRd hvoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=TW3t5Ayc; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id q51si6212673qtk.65.2020.02.16.13.42.52 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 16 Feb 2020 13:42:52 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=TW3t5Ayc; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:36860 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3Rgi-0007ti-D0 for patch@linaro.org; Sun, 16 Feb 2020 16:42:52 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:47698) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3RgU-0007q4-VG for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j3RgT-0004tB-Hc for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:38 -0500 Received: from mail-pg1-x542.google.com ([2607:f8b0:4864:20::542]:37201) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j3RgT-0004sM-C8 for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:37 -0500 Received: by mail-pg1-x542.google.com with SMTP id z12so7997761pgl.4 for ; Sun, 16 Feb 2020 13:42:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=O2GXPDLFXihWuaiW0fDwrNaCDa99hsJ9UfIJrAM1orM=; b=TW3t5Ayc5JouTnjfgnOAe8PSFoU5aml6f3YdB6tJvjBpXz/XkQ8WALVeO8veWPXxmj w7QlzmW7xF/hGsi2paL1BGdTRSuii27ZHEWanO8eMz56WsN2hc/vax8OF7gXlR+L6dwq 7KJ2p/RMCpyMhdQkKfrMfJ0mVByAXNnPQu5oUYjK5sAefpo8V/ulhgR2AF9obxwBysxE dGGc4//aaocKAIxswi5nG+vjyVuLv48QaxUStjYRmwhxhL8MqE8SmyuGBn02umo6AgBS m46seyWnj3KrwNyLNaQdnauNzj9Zzs70NzzJwxuGLWxgNBbK6gcLPUlem9O/oPAJgbQP TV7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=O2GXPDLFXihWuaiW0fDwrNaCDa99hsJ9UfIJrAM1orM=; b=dQsnnFnZ/WDUfdvfj5RNmF/NInSnSs0G65Xm/Bg6Xrcrn0OhRQEP5BXMQmT2M7srEz uvEy2s1Q7AptSYMhvkoq1GtDce3zefgzTmuspjCYYN6a+cJtN9yjLbU8YcIpoAkoemyj QbGbPLc44AsqzSbSXmdQMwMOod5hdpRGVya/FcRdcmg90LwpABO0VnMMdnDNePfYkP8V OfhY2EoZsSTbg4cCvKpUfeKhF9WSmlb8eWN3zfXctT3bZhLulmYa9cXOwQLTV7a6Lk2O P6NP5nC/oVlhEYBLxYaQMKBWwxd+RKSv1xcgWfH023ZjSl1RwKdpdbqTlkMjc5Qe3PxN +gaA== X-Gm-Message-State: APjAAAU4uFdT1c5nKCw0TmGoZj0B5dgWx2c1m6r5H+WD6NTB/ucv25WO DLVAxfGhUHapPKvetJgW9ZIVDTrImJY= X-Received: by 2002:a63:4a47:: with SMTP id j7mr14571862pgl.196.1581889355873; Sun, 16 Feb 2020 13:42:35 -0800 (PST) Received: from localhost.localdomain (97-126-123-70.tukw.qwest.net. [97.126.123.70]) by smtp.gmail.com with ESMTPSA id z29sm14821848pgc.21.2020.02.16.13.42.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Feb 2020 13:42:35 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v2 2/4] target/arm: Convert PMUL.8 to gvec Date: Sun, 16 Feb 2020 13:42:30 -0800 Message-Id: <20200216214232.4230-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200216214232.4230-1-richard.henderson@linaro.org> References: <20200216214232.4230-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::542 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org, =?utf-8?q?Alex_Benn?= =?utf-8?b?w6ll?= Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" The gvec form will be needed for implementing SVE2. Extend the implementation to operate on uint64_t instead of uint32_t. Use a counted inner loop instead of terminating when op1 goes to zero, looking toward the required implementation for ARMv8.4-DIT. Tested-by: Alex Bennée Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- v3: Elide mask after right-shift on N, as we already do N & 1 at the top of the loop. --- target/arm/helper.h | 3 ++- target/arm/neon_helper.c | 22 ---------------------- target/arm/translate-a64.c | 10 +++------- target/arm/translate.c | 11 ++++------- target/arm/vec_helper.c | 30 ++++++++++++++++++++++++++++++ 5 files changed, 39 insertions(+), 37 deletions(-) -- 2.20.1 diff --git a/target/arm/helper.h b/target/arm/helper.h index 459a278b5c..82450a3f96 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -342,7 +342,6 @@ DEF_HELPER_2(neon_sub_u8, i32, i32, i32) DEF_HELPER_2(neon_sub_u16, i32, i32, i32) DEF_HELPER_2(neon_mul_u8, i32, i32, i32) DEF_HELPER_2(neon_mul_u16, i32, i32, i32) -DEF_HELPER_2(neon_mul_p8, i32, i32, i32) DEF_HELPER_2(neon_mull_p8, i64, i32, i32) DEF_HELPER_2(neon_tst_u8, i32, i32, i32) @@ -696,6 +695,8 @@ DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c index c581ffb7d3..9e7a9a1ac5 100644 --- a/target/arm/neon_helper.c +++ b/target/arm/neon_helper.c @@ -1131,28 +1131,6 @@ NEON_VOP(mul_u16, neon_u16, 2) /* Polynomial multiplication is like integer multiplication except the partial products are XORed, not added. */ -uint32_t HELPER(neon_mul_p8)(uint32_t op1, uint32_t op2) -{ - uint32_t mask; - uint32_t result; - result = 0; - while (op1) { - mask = 0; - if (op1 & 1) - mask |= 0xff; - if (op1 & (1 << 8)) - mask |= (0xff << 8); - if (op1 & (1 << 16)) - mask |= (0xff << 16); - if (op1 & (1 << 24)) - mask |= (0xff << 24); - result ^= op2 & mask; - op1 = (op1 >> 1) & 0x7f7f7f7f; - op2 = (op2 << 1) & 0xfefefefe; - } - return result; -} - uint64_t HELPER(neon_mull_p8)(uint32_t op1, uint32_t op2) { uint64_t result = 0; diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index e42dcfebdd..c96ed28f9d 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -11160,9 +11160,10 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) case 0x13: /* MUL, PMUL */ if (!u) { /* MUL */ gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_mul, size); - return; + } else { /* PMUL */ + gen_gvec_op3_ool(s, is_q, rd, rn, rm, 0, gen_helper_gvec_pmul_b); } - break; + return; case 0x12: /* MLA, MLS */ if (u) { gen_gvec_op3(s, is_q, rd, rn, rm, &mls_op[size]); @@ -11292,11 +11293,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn = fns[size][u]; break; } - case 0x13: /* MUL, PMUL */ - assert(u); /* PMUL */ - assert(size == 0); - genfn = gen_helper_neon_mul_p8; - break; case 0x16: /* SQDMULH, SQRDMULH */ { static NeonGenTwoOpEnvFn * const fns[2][2] = { diff --git a/target/arm/translate.c b/target/arm/translate.c index c4dd14e053..4581373e31 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -5007,16 +5007,17 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) case NEON_3R_VMUL: /* VMUL */ if (u) { - /* Polynomial case allows only P8 and is handled below. */ + /* Polynomial case allows only P8. */ if (size != 0) { return 1; } + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size, + 0, gen_helper_gvec_pmul_b); } else { tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size); - return 0; } - break; + return 0; case NEON_3R_VML: /* VMLA, VMLS */ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size, @@ -5206,10 +5207,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) tmp2 = neon_load_reg(rd, pass); gen_neon_add(size, tmp, tmp2); break; - case NEON_3R_VMUL: - /* VMUL.P8; other cases already eliminated. */ - gen_helper_neon_mul_p8(tmp, tmp, tmp2); - break; case NEON_3R_VPMAX: GEN_NEON_INTEGER_OP(pmax); break; diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index fcb3663903..854de0e279 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -1134,3 +1134,33 @@ void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc) } clear_tail(d, opr_sz, simd_maxsz(desc)); } + +/* + * 8x8->8 polynomial multiply. + * + * Polynomial multiplication is like integer multiplication except the + * partial products are XORed, not added. + * + * TODO: expose this as a generic vector operation, as it is a common + * crypto building block. + */ +void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, j, opr_sz = simd_oprsz(desc); + uint64_t *d = vd, *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 8; ++i) { + uint64_t nn = n[i]; + uint64_t mm = m[i]; + uint64_t rr = 0; + + for (j = 0; j < 8; ++j) { + uint64_t mask = (nn & 0x0101010101010101ull) * 0xff; + rr ^= mm & mask; + mm = (mm << 1) & 0xfefefefefefefefeull; + nn >>= 1; + } + d[i] = rr; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} From patchwork Sun Feb 16 21:42:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 183567 Delivered-To: patch@linaro.org Received: by 2002:a92:1f12:0:0:0:0:0 with SMTP id i18csp4250091ile; Sun, 16 Feb 2020 13:44:18 -0800 (PST) X-Google-Smtp-Source: APXvYqxyHQ85a6duYd/4Q3pX0g/MYWPbR4Mfrdgrk8IUsjDvDyWmgMdWdvPwszVLJNf7jnS+9yRY X-Received: by 2002:a0c:f685:: with SMTP id p5mr10607319qvn.44.1581889458018; Sun, 16 Feb 2020 13:44:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581889458; cv=none; d=google.com; s=arc-20160816; b=HseL5LScOPH823zP4ioEc+WfQC47QkZQvAprqeYr7i4cU6eMBXA1o2m/Go2r3I36XL y4F6mnggwG6mssnTywZTCiSyx54M9SVZLyfQyH3qCCW5B5xR2k39FTQI+BRMWF3XZXlA 59pQL//9nD32ZigVdE1ommjQ7QOUBhrajEv6cPayaXR9vRsy3UORFghydCN0rf59qL5N bLmZWmipO1SKDGeUyZTiFjkF95sxm5+dWX6pZrA8nq5WVwp/j7qBD1dwO1EbeymRg8Tz GDtIof30d9paKD2X8P7kYS8qbKvVzITm+IFqhkhzAZ1eBCI1Oo7T74nkfnhSL6KQ3PzH JuBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=BlbLiic8tZvSZ65YWPCZvJVkerl00AuqwrCb0HcI98k=; b=nwWz3oXo/JiCiQ23Is97cJGh/vc+A8zCKH7lW0i5AOEfm2LfP3U55fe6TGTVvBRKEC YGyvY1tYQgOzDPVKwz+yadhrLO45Bno9Z2NHfNDjdjbuldIlhW7QAUrh4FBXt6V31fHw jmkBE5AhV52LBiz8wUGIfL+1TIAdO6P3dztgr3TuAIHNwuhwVF38vFlp2uQyMecR5C2U fUdcVv96g1Vy2GB5Oc6MZEVIiRES+7mglkFFqj+W0y7TaR/xaxLWIs4ib3vvAuxMfiil cgEM+zlEdV/QK5FKwu7U22kUHlIwvhROXuFe+DrxezPRHF0s5chzNX70CMcnZUCiES01 0/pQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=daVeXO4O; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id s34si5753697qtk.212.2020.02.16.13.44.17 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 16 Feb 2020 13:44:18 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=daVeXO4O; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:36916 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3Ri5-0001ZK-Hf for patch@linaro.org; Sun, 16 Feb 2020 16:44:17 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:47725) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3RgW-0007rh-4Q for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j3RgU-0004uZ-LD for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:40 -0500 Received: from mail-pf1-x443.google.com ([2607:f8b0:4864:20::443]:33550) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j3RgU-0004ts-FB for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:38 -0500 Received: by mail-pf1-x443.google.com with SMTP id n7so7797500pfn.0 for ; Sun, 16 Feb 2020 13:42:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=BlbLiic8tZvSZ65YWPCZvJVkerl00AuqwrCb0HcI98k=; b=daVeXO4OWCYwn8hr9NIbeneAn/kv0LTzrBn4xXLYCFhMbjRUDE4Vu03fz1NZPEdMtJ DFY/UGqotBgCtK/cJYN8R/j4ungtDq4vdttzsmOHNX5TrDUFgOENyyw5D//fkjrhR+tj 5WR3xjyh625Nbt9yNJHaN2+4IajCEJkiZTriObpAPZmd0z7E9a4G/pzKi5JKSeEZhGUo cBoxiWqpdMKWFZ3ugGCkCEhcqABvMzLGTjZtFSq6SGew5iiLm+nV7jK+XeHtulKujxVT sH04SxpCYOoTIneJG4gSU8eEFzBDvL1LUS2rwszCRVxzMLJwNdqcQn/DiXlfrsrT8F77 HQGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BlbLiic8tZvSZ65YWPCZvJVkerl00AuqwrCb0HcI98k=; b=DtSFfBrI7Zdf7BozGyl2ZfESSSHf1nwVjbjPvMZA0FWt9kdydPZM5MdWjR9uKwBaYH fKK7CzkRZTAmbg+5P1pcwt+iYTAg1hyVapREyl3jK+HMGrlsljwRe+u9ogRd9J3m8UUz YEMsRF8X2ZqkwaHqSSYZ95adO7mISlBfO9nMHvMmJ690GktZXI7erylLsJorTB89CTsq sNnzVdLIXumlXsHX1Cua+zWR9JFEKF7skiTS9H+yBVX1uVqopaowg8d1DjqVpIN3ppXA BD9XjtGh/QbnOpH5sOg80TevWx3pF4DUs+Sl/NPguiyYQGQTZ9mEg0N+IA9YJjb7dLFF Sw9A== X-Gm-Message-State: APjAAAXjVBNEN4LQeA5AtGE4zuSUb1icHxrS2tT15scu+Jk4O71n2Dj4 fzReiy27aEknAK5ZQ0BHONTDeDDAqgc= X-Received: by 2002:a65:40c8:: with SMTP id u8mr11334728pgp.188.1581889357162; Sun, 16 Feb 2020 13:42:37 -0800 (PST) Received: from localhost.localdomain (97-126-123-70.tukw.qwest.net. [97.126.123.70]) by smtp.gmail.com with ESMTPSA id z29sm14821848pgc.21.2020.02.16.13.42.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Feb 2020 13:42:36 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v2 3/4] target/arm: Convert PMULL.64 to gvec Date: Sun, 16 Feb 2020 13:42:31 -0800 Message-Id: <20200216214232.4230-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200216214232.4230-1-richard.henderson@linaro.org> References: <20200216214232.4230-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::443 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org, =?utf-8?q?Alex_Benn?= =?utf-8?b?w6ll?= Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" The gvec form will be needed for implementing SVE2. Tested-by: Alex Bennée Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- target/arm/helper.h | 4 +--- target/arm/neon_helper.c | 30 ------------------------------ target/arm/translate-a64.c | 28 +++------------------------- target/arm/translate.c | 16 ++-------------- target/arm/vec_helper.c | 33 +++++++++++++++++++++++++++++++++ 5 files changed, 39 insertions(+), 72 deletions(-) -- 2.20.1 diff --git a/target/arm/helper.h b/target/arm/helper.h index 82450a3f96..4352fae3db 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -562,9 +562,6 @@ DEF_HELPER_FLAGS_3(crc32, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32) DEF_HELPER_FLAGS_3(crc32c, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32) DEF_HELPER_2(dc_zva, void, env, i64) -DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64) -DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64) - DEF_HELPER_FLAGS_5(gvec_qrdmlah_s16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s16, TCG_CALL_NO_RWG, @@ -696,6 +693,7 @@ DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_pmull_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) #ifdef TARGET_AARCH64 #include "helper-a64.h" diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c index 9e7a9a1ac5..6a107da0e1 100644 --- a/target/arm/neon_helper.c +++ b/target/arm/neon_helper.c @@ -2152,33 +2152,3 @@ void HELPER(neon_zip16)(void *vd, void *vm) rm[0] = m0; rd[0] = d0; } - -/* Helper function for 64 bit polynomial multiply case: - * perform PolynomialMult(op1, op2) and return either the top or - * bottom half of the 128 bit result. - */ -uint64_t HELPER(neon_pmull_64_lo)(uint64_t op1, uint64_t op2) -{ - int bitnum; - uint64_t res = 0; - - for (bitnum = 0; bitnum < 64; bitnum++) { - if (op1 & (1ULL << bitnum)) { - res ^= op2 << bitnum; - } - } - return res; -} -uint64_t HELPER(neon_pmull_64_hi)(uint64_t op1, uint64_t op2) -{ - int bitnum; - uint64_t res = 0; - - /* bit 0 of op1 can't influence the high 64 bits at all */ - for (bitnum = 1; bitnum < 64; bitnum++) { - if (op1 & (1ULL << bitnum)) { - res ^= op2 >> (64 - bitnum); - } - } - return res; -} diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index c96ed28f9d..6ce1131860 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -10648,30 +10648,6 @@ static void handle_3rd_narrowing(DisasContext *s, int is_q, int is_u, int size, clear_vec_high(s, is_q, rd); } -static void handle_pmull_64(DisasContext *s, int is_q, int rd, int rn, int rm) -{ - /* PMULL of 64 x 64 -> 128 is an odd special case because it - * is the only three-reg-diff instruction which produces a - * 128-bit wide result from a single operation. However since - * it's possible to calculate the two halves more or less - * separately we just use two helper calls. - */ - TCGv_i64 tcg_op1 = tcg_temp_new_i64(); - TCGv_i64 tcg_op2 = tcg_temp_new_i64(); - TCGv_i64 tcg_res = tcg_temp_new_i64(); - - read_vec_element(s, tcg_op1, rn, is_q, MO_64); - read_vec_element(s, tcg_op2, rm, is_q, MO_64); - gen_helper_neon_pmull_64_lo(tcg_res, tcg_op1, tcg_op2); - write_vec_element(s, tcg_res, rd, 0, MO_64); - gen_helper_neon_pmull_64_hi(tcg_res, tcg_op1, tcg_op2); - write_vec_element(s, tcg_res, rd, 1, MO_64); - - tcg_temp_free_i64(tcg_op1); - tcg_temp_free_i64(tcg_op2); - tcg_temp_free_i64(tcg_res); -} - /* AdvSIMD three different * 31 30 29 28 24 23 22 21 20 16 15 12 11 10 9 5 4 0 * +---+---+---+-----------+------+---+------+--------+-----+------+------+ @@ -10736,7 +10712,9 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) if (!fp_access_check(s)) { return; } - handle_pmull_64(s, is_q, rd, rn, rm); + /* The Q field specifies lo/hi half input for this insn. */ + gen_gvec_op3_ool(s, true, rd, rn, rm, is_q, + gen_helper_gvec_pmull_q); return; } goto is_widening; diff --git a/target/arm/translate.c b/target/arm/translate.c index 4581373e31..e2dbafa161 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -5870,23 +5870,11 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) * outside the loop below as it only performs a single pass. */ if (op == 14 && size == 2) { - TCGv_i64 tcg_rn, tcg_rm, tcg_rd; - if (!dc_isar_feature(aa32_pmull, s)) { return 1; } - tcg_rn = tcg_temp_new_i64(); - tcg_rm = tcg_temp_new_i64(); - tcg_rd = tcg_temp_new_i64(); - neon_load_reg64(tcg_rn, rn); - neon_load_reg64(tcg_rm, rm); - gen_helper_neon_pmull_64_lo(tcg_rd, tcg_rn, tcg_rm); - neon_store_reg64(tcg_rd, rd); - gen_helper_neon_pmull_64_hi(tcg_rd, tcg_rn, tcg_rm); - neon_store_reg64(tcg_rd, rd + 1); - tcg_temp_free_i64(tcg_rn); - tcg_temp_free_i64(tcg_rm); - tcg_temp_free_i64(tcg_rd); + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16, + 0, gen_helper_gvec_pmull_q); return 0; } diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 854de0e279..79d2624f7b 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -1164,3 +1164,36 @@ void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc) } clear_tail(d, opr_sz, simd_maxsz(desc)); } + +/* + * 64x64->128 polynomial multiply. + * Because of the lanes are not accessed in strict columns, + * this probably cannot be turned into a generic helper. + */ +void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, j, opr_sz = simd_oprsz(desc); + intptr_t hi = simd_data(desc); + uint64_t *d = vd, *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 8; i += 2) { + uint64_t nn = n[i + hi]; + uint64_t mm = m[i + hi]; + uint64_t rhi = 0; + uint64_t rlo = 0; + + /* Bit 0 can only influence the low 64-bit result. */ + if (nn & 1) { + rlo = mm; + } + + for (j = 1; j < 64; ++j) { + uint64_t mask = -((nn >> j) & 1); + rlo ^= (mm << j) & mask; + rhi ^= (mm >> (64 - j)) & mask; + } + d[i] = rlo; + d[i + 1] = rhi; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} From patchwork Sun Feb 16 21:42:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 183569 Delivered-To: patch@linaro.org Received: by 2002:a92:1f12:0:0:0:0:0 with SMTP id i18csp4252029ile; Sun, 16 Feb 2020 13:47:25 -0800 (PST) X-Google-Smtp-Source: APXvYqxKFks/3D1fRLjiEkNq+CeckaAE8MsMcxxd/7SD947v5UGjSpMVYIQ/VOxzVkjjSw0oFnBP X-Received: by 2002:ae9:e10e:: with SMTP id g14mr12183985qkm.430.1581889645186; Sun, 16 Feb 2020 13:47:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581889645; cv=none; d=google.com; s=arc-20160816; b=h7Lv68QrLUiEsSipoxN3sUZ/3SiIOGyLf7auqLL59CDSMPh/L6eQaC9G2IUzTE8Wh/ WZJpfeen8Tr/ycFiADFkN6tib1AvIX6JthIPeyLz4OfeTQmLs6cHRQ4IHe1JpKj0549i jlSIZSyBLKYVX9oL0vjD7PfSWl1Fr6BEX9nUmpQaq92Xf01Gp3lU4O61PNTj4TBCLwME 0e93w0VBleUhitQqlAaPT8yZ0osxkd7U1AigHCu+dkar+3wgxom0HMQmq3Yz6QW7Fkt5 c6IoUVO92CZs7o2FyPP6hnTGY1nI2THwV0pmQrFW2o8LBvXfL4ypLXrElRdmK+X0Pt4/ d4nA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=eMj4/+qaGhoNS+9r6sBtA5cZajrCIO2mP4pzA4t1Oe8=; b=mETFZ7YC662P5zyXl9rHVzjmJwY5Qwi80HjnwCe67djzgG9ucwc/T/RZe+LXV6b9cc SeNRY6jMbHS35OJi/htTutlaTbckUBXkVb3qdFWEdtaMcVbPyDwgoUq7iJ9iCGmiPJD+ bUB3xC15NfGNrmvml1LSDPK8qf0pN6uFW9+eG38l3e/gFFeF71CALxi0tpNR/yEF/h+o u/OrX929bgwK6ozORv1iCO0jAh99c8SyRsv2fZJA++h1rfAUmDJWH9essYQzAH7+G1Ji AkUAW47NXljVN1bQoJaJtL3NO8OFsIDN/QGboPf9fPjifGbanbddL0donwS0cEpAGULU jvSg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZaLvYpO7; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id s8si6449810qkj.85.2020.02.16.13.47.25 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 16 Feb 2020 13:47:25 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZaLvYpO7; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:37004 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3Rl6-0005rk-Ky for patch@linaro.org; Sun, 16 Feb 2020 16:47:24 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:47754) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3RgX-0007uy-OM for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j3RgV-0004wV-Vx for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:41 -0500 Received: from mail-pf1-x443.google.com ([2607:f8b0:4864:20::443]:37527) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j3RgV-0004va-OA for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:39 -0500 Received: by mail-pf1-x443.google.com with SMTP id p14so7793578pfn.4 for ; Sun, 16 Feb 2020 13:42:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=eMj4/+qaGhoNS+9r6sBtA5cZajrCIO2mP4pzA4t1Oe8=; b=ZaLvYpO7Plsa4JkrHEWOMHzEilmAl84Ww9Bs9YFX67ULjFzKwrz1bNXW/4NO6v2aai w0TjRvCuxeb1i+YtniFnEP2J6JfURLPtAECrB8gaAugVAIcVXcPZc6LSiyViGw8mON9g XcUU6kvVRvPIFh+BWMmXoxN34+Bkf7Ebg3WkmpiT2Rba4myqJ+JArNcsbQIkJP9Cc5Kj 8Gm2iicF6UMawAijwMsxiEOwDliOCRMN98eNfk7SHZhY5Xw11Sx+GBptng+pugCtG6R5 26SsMi1BW3z3AUUK5Ok8O4HUmzdQ5tg+fDwnexXknp2EDQGS592shvIvzQLQeruRE1+x aWXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eMj4/+qaGhoNS+9r6sBtA5cZajrCIO2mP4pzA4t1Oe8=; b=Jih0ph2UAOp7IDwIUdD1b24EZTNcGQt8BOXHjCBYUPKSnufiRF2Bbmwb5AKk4Mffpr 6ZTm47SJwGYyfaQ5Fw+QZ4WwcIzR88PzRA9Z2QKMiB/DFT7sNJmGd/sPF1WwSoNx9HYS 1eCbPcxYU7PrFhzBHJj/1haGNLleenOFmuMVaS3If9cD7oVjuEZkhkTfmipwM05z1nE0 ta9tHDf94h6P9Qwh9QoLVwadF2wAmco0835RKuGrqVLfFWa5vq1SVboFeUCljqEo8urn 2p3/RPkYBqtxm3w4SprkT6V7fz1/uBxc17s3QLGYRR19Icw/X098TW/iWyoPEtsY54ib YXIg== X-Gm-Message-State: APjAAAUuKwzmHxMcoND/TCyCSJAu66bKd6b24anANbEYRU/jXStpu6z9 MK0tx+rRYEjlJ4XyP/SDmFfdXszHRi4= X-Received: by 2002:a63:a4b:: with SMTP id z11mr13916456pgk.398.1581889358362; Sun, 16 Feb 2020 13:42:38 -0800 (PST) Received: from localhost.localdomain (97-126-123-70.tukw.qwest.net. [97.126.123.70]) by smtp.gmail.com with ESMTPSA id z29sm14821848pgc.21.2020.02.16.13.42.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Feb 2020 13:42:37 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v2 4/4] target/arm: Convert PMULL.8 to gvec Date: Sun, 16 Feb 2020 13:42:32 -0800 Message-Id: <20200216214232.4230-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200216214232.4230-1-richard.henderson@linaro.org> References: <20200216214232.4230-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::443 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org, =?utf-8?q?Alex_Benn?= =?utf-8?b?w6ll?= Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" We still need two different helpers, since NEON and SVE2 get the inputs from different locations within the source vector. However, we can convert both to the same internal form for computation. The sve2 helper is not used yet, but adding it with this patch helps illustrate why the neon changes are helpful. Tested-by: Alex Bennée Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 2 ++ target/arm/helper.h | 3 +- target/arm/neon_helper.c | 32 -------------------- target/arm/translate-a64.c | 27 +++++++++++------ target/arm/translate.c | 26 ++++++++--------- target/arm/vec_helper.c | 60 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 95 insertions(+), 55 deletions(-) -- 2.20.1 diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 9e79182ab4..2f47279155 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1574,3 +1574,5 @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve2_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/helper.h b/target/arm/helper.h index 4352fae3db..fcbf504121 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -342,7 +342,6 @@ DEF_HELPER_2(neon_sub_u8, i32, i32, i32) DEF_HELPER_2(neon_sub_u16, i32, i32, i32) DEF_HELPER_2(neon_mul_u8, i32, i32, i32) DEF_HELPER_2(neon_mul_u16, i32, i32, i32) -DEF_HELPER_2(neon_mull_p8, i64, i32, i32) DEF_HELPER_2(neon_tst_u8, i32, i32, i32) DEF_HELPER_2(neon_tst_u16, i32, i32, i32) @@ -695,6 +694,8 @@ DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_pmull_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(neon_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c index 6a107da0e1..c7a8438b42 100644 --- a/target/arm/neon_helper.c +++ b/target/arm/neon_helper.c @@ -1129,38 +1129,6 @@ NEON_VOP(mul_u8, neon_u8, 4) NEON_VOP(mul_u16, neon_u16, 2) #undef NEON_FN -/* Polynomial multiplication is like integer multiplication except the - partial products are XORed, not added. */ -uint64_t HELPER(neon_mull_p8)(uint32_t op1, uint32_t op2) -{ - uint64_t result = 0; - uint64_t mask; - uint64_t op2ex = op2; - op2ex = (op2ex & 0xff) | - ((op2ex & 0xff00) << 8) | - ((op2ex & 0xff0000) << 16) | - ((op2ex & 0xff000000) << 24); - while (op1) { - mask = 0; - if (op1 & 1) { - mask |= 0xffff; - } - if (op1 & (1 << 8)) { - mask |= (0xffffU << 16); - } - if (op1 & (1 << 16)) { - mask |= (0xffffULL << 32); - } - if (op1 & (1 << 24)) { - mask |= (0xffffULL << 48); - } - result ^= op2ex & mask; - op1 = (op1 >> 1) & 0x7f7f7f7f; - op2ex <<= 1; - } - return result; -} - #define NEON_FN(dest, src1, src2) dest = (src1 & src2) ? -1 : 0 NEON_VOP(tst_u8, neon_u8, 4) NEON_VOP(tst_u16, neon_u16, 2) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 6ce1131860..63ff042d60 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -10533,10 +10533,6 @@ static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size, gen_helper_neon_addl_saturate_s32(tcg_passres, cpu_env, tcg_passres, tcg_passres); break; - case 14: /* PMULL */ - assert(size == 0); - gen_helper_neon_mull_p8(tcg_passres, tcg_op1, tcg_op2); - break; default: g_assert_not_reached(); } @@ -10700,11 +10696,21 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) handle_3rd_narrowing(s, is_q, is_u, size, opcode, rd, rn, rm); break; case 14: /* PMULL, PMULL2 */ - if (is_u || size == 1 || size == 2) { + if (is_u) { unallocated_encoding(s); return; } - if (size == 3) { + switch (size) { + case 0: /* PMULL.P8 */ + if (!fp_access_check(s)) { + return; + } + /* The Q field specifies lo/hi half input for this insn. */ + gen_gvec_op3_ool(s, true, rd, rn, rm, is_q, + gen_helper_neon_pmull_h); + break; + + case 3: /* PMULL.P64 */ if (!dc_isar_feature(aa64_pmull, s)) { unallocated_encoding(s); return; @@ -10715,9 +10721,13 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) /* The Q field specifies lo/hi half input for this insn. */ gen_gvec_op3_ool(s, true, rd, rn, rm, is_q, gen_helper_gvec_pmull_q); - return; + break; + + default: + unallocated_encoding(s); + break; } - goto is_widening; + return; case 9: /* SQDMLAL, SQDMLAL2 */ case 11: /* SQDMLSL, SQDMLSL2 */ case 13: /* SQDMULL, SQDMULL2 */ @@ -10738,7 +10748,6 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) unallocated_encoding(s); return; } - is_widening: if (!fp_access_check(s)) { return; } diff --git a/target/arm/translate.c b/target/arm/translate.c index e2dbafa161..07178b632a 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -5866,15 +5866,20 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) return 1; } - /* Handle VMULL.P64 (Polynomial 64x64 to 128 bit multiply) - * outside the loop below as it only performs a single pass. - */ - if (op == 14 && size == 2) { - if (!dc_isar_feature(aa32_pmull, s)) { - return 1; + /* Handle polynomial VMULL in a single pass. */ + if (op == 14) { + if (size == 0) { + /* VMULL.P8 */ + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16, + 0, gen_helper_neon_pmull_h); + } else { + /* VMULL.P64 */ + if (!dc_isar_feature(aa32_pmull, s)) { + return 1; + } + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16, + 0, gen_helper_gvec_pmull_q); } - tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16, - 0, gen_helper_gvec_pmull_q); return 0; } @@ -5952,11 +5957,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */ gen_neon_mull(cpu_V0, tmp, tmp2, size, u); break; - case 14: /* Polynomial VMULL */ - gen_helper_neon_mull_p8(cpu_V0, tmp, tmp2); - tcg_temp_free_i32(tmp2); - tcg_temp_free_i32(tmp); - break; default: /* 15 is RESERVED: caught earlier */ abort(); } diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 79d2624f7b..8017bd88c4 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -1197,3 +1197,63 @@ void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc) } clear_tail(d, opr_sz, simd_maxsz(desc)); } + +/* + * 8x8->16 polynomial multiply. + * + * The byte inputs are expanded to (or extracted from) half-words. + * Note that neon and sve2 get the inputs from different positions. + * This allows 4 bytes to be processed in parallel with uint64_t. + */ + +static uint64_t expand_byte_to_half(uint64_t x) +{ + return (x & 0x000000ff) + | ((x & 0x0000ff00) << 8) + | ((x & 0x00ff0000) << 16) + | ((x & 0xff000000) << 24); +} + +static uint64_t pmull_h(uint64_t op1, uint64_t op2) +{ + uint64_t result = 0; + int i; + + for (i = 0; i < 8; ++i) { + uint64_t mask = (op1 & 0x0001000100010001ull) * 0xffff; + result ^= op2 & mask; + op1 >>= 1; + op2 <<= 1; + } + return result; +} + +void HELPER(neon_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + int hi = simd_data(desc); + uint64_t *d = vd, *n = vn, *m = vm; + uint64_t nn = n[hi], mm = m[hi]; + + d[0] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm)); + nn >>= 32; + mm >>= 32; + d[1] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm)); + + clear_tail(d, 16, simd_maxsz(desc)); +} + +#ifdef TARGET_AARCH64 +void HELPER(sve2_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + int shift = simd_data(desc) * 8; + intptr_t i, opr_sz = simd_oprsz(desc); + uint64_t *d = vd, *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 8; ++i) { + uint64_t nn = (n[i] >> shift) & 0x00ff00ff00ff00ffull; + uint64_t mm = (m[i] >> shift) & 0x00ff00ff00ff00ffull; + + d[i] = pmull_h(nn, mm); + } +} +#endif