From patchwork Sun Feb 16 21:42:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 183569 Delivered-To: patch@linaro.org Received: by 2002:a92:1f12:0:0:0:0:0 with SMTP id i18csp4252029ile; Sun, 16 Feb 2020 13:47:25 -0800 (PST) X-Google-Smtp-Source: APXvYqxKFks/3D1fRLjiEkNq+CeckaAE8MsMcxxd/7SD947v5UGjSpMVYIQ/VOxzVkjjSw0oFnBP X-Received: by 2002:ae9:e10e:: with SMTP id g14mr12183985qkm.430.1581889645186; Sun, 16 Feb 2020 13:47:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581889645; cv=none; d=google.com; s=arc-20160816; b=h7Lv68QrLUiEsSipoxN3sUZ/3SiIOGyLf7auqLL59CDSMPh/L6eQaC9G2IUzTE8Wh/ WZJpfeen8Tr/ycFiADFkN6tib1AvIX6JthIPeyLz4OfeTQmLs6cHRQ4IHe1JpKj0549i jlSIZSyBLKYVX9oL0vjD7PfSWl1Fr6BEX9nUmpQaq92Xf01Gp3lU4O61PNTj4TBCLwME 0e93w0VBleUhitQqlAaPT8yZ0osxkd7U1AigHCu+dkar+3wgxom0HMQmq3Yz6QW7Fkt5 c6IoUVO92CZs7o2FyPP6hnTGY1nI2THwV0pmQrFW2o8LBvXfL4ypLXrElRdmK+X0Pt4/ d4nA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=eMj4/+qaGhoNS+9r6sBtA5cZajrCIO2mP4pzA4t1Oe8=; b=mETFZ7YC662P5zyXl9rHVzjmJwY5Qwi80HjnwCe67djzgG9ucwc/T/RZe+LXV6b9cc SeNRY6jMbHS35OJi/htTutlaTbckUBXkVb3qdFWEdtaMcVbPyDwgoUq7iJ9iCGmiPJD+ bUB3xC15NfGNrmvml1LSDPK8qf0pN6uFW9+eG38l3e/gFFeF71CALxi0tpNR/yEF/h+o u/OrX929bgwK6ozORv1iCO0jAh99c8SyRsv2fZJA++h1rfAUmDJWH9essYQzAH7+G1Ji AkUAW47NXljVN1bQoJaJtL3NO8OFsIDN/QGboPf9fPjifGbanbddL0donwS0cEpAGULU jvSg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZaLvYpO7; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id s8si6449810qkj.85.2020.02.16.13.47.25 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 16 Feb 2020 13:47:25 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZaLvYpO7; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:37004 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3Rl6-0005rk-Ky for patch@linaro.org; Sun, 16 Feb 2020 16:47:24 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:47754) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3RgX-0007uy-OM for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j3RgV-0004wV-Vx for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:41 -0500 Received: from mail-pf1-x443.google.com ([2607:f8b0:4864:20::443]:37527) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j3RgV-0004va-OA for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:39 -0500 Received: by mail-pf1-x443.google.com with SMTP id p14so7793578pfn.4 for ; Sun, 16 Feb 2020 13:42:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=eMj4/+qaGhoNS+9r6sBtA5cZajrCIO2mP4pzA4t1Oe8=; b=ZaLvYpO7Plsa4JkrHEWOMHzEilmAl84Ww9Bs9YFX67ULjFzKwrz1bNXW/4NO6v2aai w0TjRvCuxeb1i+YtniFnEP2J6JfURLPtAECrB8gaAugVAIcVXcPZc6LSiyViGw8mON9g XcUU6kvVRvPIFh+BWMmXoxN34+Bkf7Ebg3WkmpiT2Rba4myqJ+JArNcsbQIkJP9Cc5Kj 8Gm2iicF6UMawAijwMsxiEOwDliOCRMN98eNfk7SHZhY5Xw11Sx+GBptng+pugCtG6R5 26SsMi1BW3z3AUUK5Ok8O4HUmzdQ5tg+fDwnexXknp2EDQGS592shvIvzQLQeruRE1+x aWXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eMj4/+qaGhoNS+9r6sBtA5cZajrCIO2mP4pzA4t1Oe8=; b=Jih0ph2UAOp7IDwIUdD1b24EZTNcGQt8BOXHjCBYUPKSnufiRF2Bbmwb5AKk4Mffpr 6ZTm47SJwGYyfaQ5Fw+QZ4WwcIzR88PzRA9Z2QKMiB/DFT7sNJmGd/sPF1WwSoNx9HYS 1eCbPcxYU7PrFhzBHJj/1haGNLleenOFmuMVaS3If9cD7oVjuEZkhkTfmipwM05z1nE0 ta9tHDf94h6P9Qwh9QoLVwadF2wAmco0835RKuGrqVLfFWa5vq1SVboFeUCljqEo8urn 2p3/RPkYBqtxm3w4SprkT6V7fz1/uBxc17s3QLGYRR19Icw/X098TW/iWyoPEtsY54ib YXIg== X-Gm-Message-State: APjAAAUuKwzmHxMcoND/TCyCSJAu66bKd6b24anANbEYRU/jXStpu6z9 MK0tx+rRYEjlJ4XyP/SDmFfdXszHRi4= X-Received: by 2002:a63:a4b:: with SMTP id z11mr13916456pgk.398.1581889358362; Sun, 16 Feb 2020 13:42:38 -0800 (PST) Received: from localhost.localdomain (97-126-123-70.tukw.qwest.net. [97.126.123.70]) by smtp.gmail.com with ESMTPSA id z29sm14821848pgc.21.2020.02.16.13.42.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Feb 2020 13:42:37 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v2 4/4] target/arm: Convert PMULL.8 to gvec Date: Sun, 16 Feb 2020 13:42:32 -0800 Message-Id: <20200216214232.4230-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200216214232.4230-1-richard.henderson@linaro.org> References: <20200216214232.4230-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::443 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org, =?utf-8?q?Alex_Benn?= =?utf-8?b?w6ll?= Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" We still need two different helpers, since NEON and SVE2 get the inputs from different locations within the source vector. However, we can convert both to the same internal form for computation. The sve2 helper is not used yet, but adding it with this patch helps illustrate why the neon changes are helpful. Tested-by: Alex Bennée Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 2 ++ target/arm/helper.h | 3 +- target/arm/neon_helper.c | 32 -------------------- target/arm/translate-a64.c | 27 +++++++++++------ target/arm/translate.c | 26 ++++++++--------- target/arm/vec_helper.c | 60 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 95 insertions(+), 55 deletions(-) -- 2.20.1 diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 9e79182ab4..2f47279155 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1574,3 +1574,5 @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve2_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/helper.h b/target/arm/helper.h index 4352fae3db..fcbf504121 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -342,7 +342,6 @@ DEF_HELPER_2(neon_sub_u8, i32, i32, i32) DEF_HELPER_2(neon_sub_u16, i32, i32, i32) DEF_HELPER_2(neon_mul_u8, i32, i32, i32) DEF_HELPER_2(neon_mul_u16, i32, i32, i32) -DEF_HELPER_2(neon_mull_p8, i64, i32, i32) DEF_HELPER_2(neon_tst_u8, i32, i32, i32) DEF_HELPER_2(neon_tst_u16, i32, i32, i32) @@ -695,6 +694,8 @@ DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_pmull_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(neon_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c index 6a107da0e1..c7a8438b42 100644 --- a/target/arm/neon_helper.c +++ b/target/arm/neon_helper.c @@ -1129,38 +1129,6 @@ NEON_VOP(mul_u8, neon_u8, 4) NEON_VOP(mul_u16, neon_u16, 2) #undef NEON_FN -/* Polynomial multiplication is like integer multiplication except the - partial products are XORed, not added. */ -uint64_t HELPER(neon_mull_p8)(uint32_t op1, uint32_t op2) -{ - uint64_t result = 0; - uint64_t mask; - uint64_t op2ex = op2; - op2ex = (op2ex & 0xff) | - ((op2ex & 0xff00) << 8) | - ((op2ex & 0xff0000) << 16) | - ((op2ex & 0xff000000) << 24); - while (op1) { - mask = 0; - if (op1 & 1) { - mask |= 0xffff; - } - if (op1 & (1 << 8)) { - mask |= (0xffffU << 16); - } - if (op1 & (1 << 16)) { - mask |= (0xffffULL << 32); - } - if (op1 & (1 << 24)) { - mask |= (0xffffULL << 48); - } - result ^= op2ex & mask; - op1 = (op1 >> 1) & 0x7f7f7f7f; - op2ex <<= 1; - } - return result; -} - #define NEON_FN(dest, src1, src2) dest = (src1 & src2) ? -1 : 0 NEON_VOP(tst_u8, neon_u8, 4) NEON_VOP(tst_u16, neon_u16, 2) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 6ce1131860..63ff042d60 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -10533,10 +10533,6 @@ static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size, gen_helper_neon_addl_saturate_s32(tcg_passres, cpu_env, tcg_passres, tcg_passres); break; - case 14: /* PMULL */ - assert(size == 0); - gen_helper_neon_mull_p8(tcg_passres, tcg_op1, tcg_op2); - break; default: g_assert_not_reached(); } @@ -10700,11 +10696,21 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) handle_3rd_narrowing(s, is_q, is_u, size, opcode, rd, rn, rm); break; case 14: /* PMULL, PMULL2 */ - if (is_u || size == 1 || size == 2) { + if (is_u) { unallocated_encoding(s); return; } - if (size == 3) { + switch (size) { + case 0: /* PMULL.P8 */ + if (!fp_access_check(s)) { + return; + } + /* The Q field specifies lo/hi half input for this insn. */ + gen_gvec_op3_ool(s, true, rd, rn, rm, is_q, + gen_helper_neon_pmull_h); + break; + + case 3: /* PMULL.P64 */ if (!dc_isar_feature(aa64_pmull, s)) { unallocated_encoding(s); return; @@ -10715,9 +10721,13 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) /* The Q field specifies lo/hi half input for this insn. */ gen_gvec_op3_ool(s, true, rd, rn, rm, is_q, gen_helper_gvec_pmull_q); - return; + break; + + default: + unallocated_encoding(s); + break; } - goto is_widening; + return; case 9: /* SQDMLAL, SQDMLAL2 */ case 11: /* SQDMLSL, SQDMLSL2 */ case 13: /* SQDMULL, SQDMULL2 */ @@ -10738,7 +10748,6 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) unallocated_encoding(s); return; } - is_widening: if (!fp_access_check(s)) { return; } diff --git a/target/arm/translate.c b/target/arm/translate.c index e2dbafa161..07178b632a 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -5866,15 +5866,20 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) return 1; } - /* Handle VMULL.P64 (Polynomial 64x64 to 128 bit multiply) - * outside the loop below as it only performs a single pass. - */ - if (op == 14 && size == 2) { - if (!dc_isar_feature(aa32_pmull, s)) { - return 1; + /* Handle polynomial VMULL in a single pass. */ + if (op == 14) { + if (size == 0) { + /* VMULL.P8 */ + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16, + 0, gen_helper_neon_pmull_h); + } else { + /* VMULL.P64 */ + if (!dc_isar_feature(aa32_pmull, s)) { + return 1; + } + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16, + 0, gen_helper_gvec_pmull_q); } - tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16, - 0, gen_helper_gvec_pmull_q); return 0; } @@ -5952,11 +5957,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */ gen_neon_mull(cpu_V0, tmp, tmp2, size, u); break; - case 14: /* Polynomial VMULL */ - gen_helper_neon_mull_p8(cpu_V0, tmp, tmp2); - tcg_temp_free_i32(tmp2); - tcg_temp_free_i32(tmp); - break; default: /* 15 is RESERVED: caught earlier */ abort(); } diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 79d2624f7b..8017bd88c4 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -1197,3 +1197,63 @@ void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc) } clear_tail(d, opr_sz, simd_maxsz(desc)); } + +/* + * 8x8->16 polynomial multiply. + * + * The byte inputs are expanded to (or extracted from) half-words. + * Note that neon and sve2 get the inputs from different positions. + * This allows 4 bytes to be processed in parallel with uint64_t. + */ + +static uint64_t expand_byte_to_half(uint64_t x) +{ + return (x & 0x000000ff) + | ((x & 0x0000ff00) << 8) + | ((x & 0x00ff0000) << 16) + | ((x & 0xff000000) << 24); +} + +static uint64_t pmull_h(uint64_t op1, uint64_t op2) +{ + uint64_t result = 0; + int i; + + for (i = 0; i < 8; ++i) { + uint64_t mask = (op1 & 0x0001000100010001ull) * 0xffff; + result ^= op2 & mask; + op1 >>= 1; + op2 <<= 1; + } + return result; +} + +void HELPER(neon_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + int hi = simd_data(desc); + uint64_t *d = vd, *n = vn, *m = vm; + uint64_t nn = n[hi], mm = m[hi]; + + d[0] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm)); + nn >>= 32; + mm >>= 32; + d[1] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm)); + + clear_tail(d, 16, simd_maxsz(desc)); +} + +#ifdef TARGET_AARCH64 +void HELPER(sve2_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + int shift = simd_data(desc) * 8; + intptr_t i, opr_sz = simd_oprsz(desc); + uint64_t *d = vd, *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 8; ++i) { + uint64_t nn = (n[i] >> shift) & 0x00ff00ff00ff00ffull; + uint64_t mm = (m[i] >> shift) & 0x00ff00ff00ff00ffull; + + d[i] = pmull_h(nn, mm); + } +} +#endif