From patchwork Sun Feb 16 21:42:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 183566 Delivered-To: patch@linaro.org Received: by 2002:a92:1f12:0:0:0:0:0 with SMTP id i18csp4249132ile; Sun, 16 Feb 2020 13:42:53 -0800 (PST) X-Google-Smtp-Source: APXvYqyty2O6EWXL2gC6dmBMtZDlSI+bmnrXc27hI6FcHYi5nDcm2HpZj78tCgndnHJaF1udCwnO X-Received: by 2002:ac8:349d:: with SMTP id w29mr11090403qtb.386.1581889372901; Sun, 16 Feb 2020 13:42:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581889372; cv=none; d=google.com; s=arc-20160816; b=D7Hbg6V5vI1D2LqT72wcMHQNK2O9GPeQoCnr0xqKN1kPcne+o/wepzubvjiFNIJm7j YwF/v2VLoRAmnTXFW+PZn6lCvvUw6KRNRMTBcOntXFZ9s8TfINiR1SXb50ZpbXIjPsVy xlBzpqCZQvayMPOP7C+d+tU5wacldc7F2XB5L+7AaBCtyAQZmEceIQQmIft3yH1mY4jN OHa6vw2GYMVLjTh98dPce81aXgEq/vfOxGeGExaopT+eMW3nUIgYtI55DVAbUuyXxG+K a+XWUUiCMdMG68NV3JAc2qp0kDPRNHOs2wX89lRvd0BUt0OIWVyFsJxnivduhaz8NRq7 i7lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=O2GXPDLFXihWuaiW0fDwrNaCDa99hsJ9UfIJrAM1orM=; b=TZPbIGQdvRcZibQ8wpSlN+Be1bQk7tmRIOFZMwOC3VDSsfmhhCHzoeewfRmiy2K3Af 2vMjs2JmHTXTdf4ZaxI33QaRg9BphMvDAFllWQxnAksgor9zbgQDm5A+5ILicJNtsBZf zS+aRAEuwDf2eXM6bd/ckKN4DdGiI8mjezJ9md6NMm0HiQG0UnWoTuvYS6XHkPUWtnwj FAv9JIsdpIBLw87/czUBqJBSGhZDZvIAcJ6M+LZ9QsL/sFvu6QUR2ZxIydyk9q0lfT5N loGYuUXcgUz6Q6/MjHewHX+R/HGSkaSTF5n0c1asCppaO+su4ZI3wu/8D9R/jkyMHVRd hvoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=TW3t5Ayc; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id q51si6212673qtk.65.2020.02.16.13.42.52 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 16 Feb 2020 13:42:52 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=TW3t5Ayc; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:36860 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3Rgi-0007ti-D0 for patch@linaro.org; Sun, 16 Feb 2020 16:42:52 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:47698) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j3RgU-0007q4-VG for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j3RgT-0004tB-Hc for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:38 -0500 Received: from mail-pg1-x542.google.com ([2607:f8b0:4864:20::542]:37201) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j3RgT-0004sM-C8 for qemu-devel@nongnu.org; Sun, 16 Feb 2020 16:42:37 -0500 Received: by mail-pg1-x542.google.com with SMTP id z12so7997761pgl.4 for ; Sun, 16 Feb 2020 13:42:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=O2GXPDLFXihWuaiW0fDwrNaCDa99hsJ9UfIJrAM1orM=; b=TW3t5Ayc5JouTnjfgnOAe8PSFoU5aml6f3YdB6tJvjBpXz/XkQ8WALVeO8veWPXxmj w7QlzmW7xF/hGsi2paL1BGdTRSuii27ZHEWanO8eMz56WsN2hc/vax8OF7gXlR+L6dwq 7KJ2p/RMCpyMhdQkKfrMfJ0mVByAXNnPQu5oUYjK5sAefpo8V/ulhgR2AF9obxwBysxE dGGc4//aaocKAIxswi5nG+vjyVuLv48QaxUStjYRmwhxhL8MqE8SmyuGBn02umo6AgBS m46seyWnj3KrwNyLNaQdnauNzj9Zzs70NzzJwxuGLWxgNBbK6gcLPUlem9O/oPAJgbQP TV7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=O2GXPDLFXihWuaiW0fDwrNaCDa99hsJ9UfIJrAM1orM=; b=dQsnnFnZ/WDUfdvfj5RNmF/NInSnSs0G65Xm/Bg6Xrcrn0OhRQEP5BXMQmT2M7srEz uvEy2s1Q7AptSYMhvkoq1GtDce3zefgzTmuspjCYYN6a+cJtN9yjLbU8YcIpoAkoemyj QbGbPLc44AsqzSbSXmdQMwMOod5hdpRGVya/FcRdcmg90LwpABO0VnMMdnDNePfYkP8V OfhY2EoZsSTbg4cCvKpUfeKhF9WSmlb8eWN3zfXctT3bZhLulmYa9cXOwQLTV7a6Lk2O P6NP5nC/oVlhEYBLxYaQMKBWwxd+RKSv1xcgWfH023ZjSl1RwKdpdbqTlkMjc5Qe3PxN +gaA== X-Gm-Message-State: APjAAAU4uFdT1c5nKCw0TmGoZj0B5dgWx2c1m6r5H+WD6NTB/ucv25WO DLVAxfGhUHapPKvetJgW9ZIVDTrImJY= X-Received: by 2002:a63:4a47:: with SMTP id j7mr14571862pgl.196.1581889355873; Sun, 16 Feb 2020 13:42:35 -0800 (PST) Received: from localhost.localdomain (97-126-123-70.tukw.qwest.net. [97.126.123.70]) by smtp.gmail.com with ESMTPSA id z29sm14821848pgc.21.2020.02.16.13.42.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Feb 2020 13:42:35 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v2 2/4] target/arm: Convert PMUL.8 to gvec Date: Sun, 16 Feb 2020 13:42:30 -0800 Message-Id: <20200216214232.4230-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200216214232.4230-1-richard.henderson@linaro.org> References: <20200216214232.4230-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::542 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org, =?utf-8?q?Alex_Benn?= =?utf-8?b?w6ll?= Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" The gvec form will be needed for implementing SVE2. Extend the implementation to operate on uint64_t instead of uint32_t. Use a counted inner loop instead of terminating when op1 goes to zero, looking toward the required implementation for ARMv8.4-DIT. Tested-by: Alex Bennée Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- v3: Elide mask after right-shift on N, as we already do N & 1 at the top of the loop. --- target/arm/helper.h | 3 ++- target/arm/neon_helper.c | 22 ---------------------- target/arm/translate-a64.c | 10 +++------- target/arm/translate.c | 11 ++++------- target/arm/vec_helper.c | 30 ++++++++++++++++++++++++++++++ 5 files changed, 39 insertions(+), 37 deletions(-) -- 2.20.1 diff --git a/target/arm/helper.h b/target/arm/helper.h index 459a278b5c..82450a3f96 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -342,7 +342,6 @@ DEF_HELPER_2(neon_sub_u8, i32, i32, i32) DEF_HELPER_2(neon_sub_u16, i32, i32, i32) DEF_HELPER_2(neon_mul_u8, i32, i32, i32) DEF_HELPER_2(neon_mul_u16, i32, i32, i32) -DEF_HELPER_2(neon_mul_p8, i32, i32, i32) DEF_HELPER_2(neon_mull_p8, i64, i32, i32) DEF_HELPER_2(neon_tst_u8, i32, i32, i32) @@ -696,6 +695,8 @@ DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c index c581ffb7d3..9e7a9a1ac5 100644 --- a/target/arm/neon_helper.c +++ b/target/arm/neon_helper.c @@ -1131,28 +1131,6 @@ NEON_VOP(mul_u16, neon_u16, 2) /* Polynomial multiplication is like integer multiplication except the partial products are XORed, not added. */ -uint32_t HELPER(neon_mul_p8)(uint32_t op1, uint32_t op2) -{ - uint32_t mask; - uint32_t result; - result = 0; - while (op1) { - mask = 0; - if (op1 & 1) - mask |= 0xff; - if (op1 & (1 << 8)) - mask |= (0xff << 8); - if (op1 & (1 << 16)) - mask |= (0xff << 16); - if (op1 & (1 << 24)) - mask |= (0xff << 24); - result ^= op2 & mask; - op1 = (op1 >> 1) & 0x7f7f7f7f; - op2 = (op2 << 1) & 0xfefefefe; - } - return result; -} - uint64_t HELPER(neon_mull_p8)(uint32_t op1, uint32_t op2) { uint64_t result = 0; diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index e42dcfebdd..c96ed28f9d 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -11160,9 +11160,10 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) case 0x13: /* MUL, PMUL */ if (!u) { /* MUL */ gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_mul, size); - return; + } else { /* PMUL */ + gen_gvec_op3_ool(s, is_q, rd, rn, rm, 0, gen_helper_gvec_pmul_b); } - break; + return; case 0x12: /* MLA, MLS */ if (u) { gen_gvec_op3(s, is_q, rd, rn, rm, &mls_op[size]); @@ -11292,11 +11293,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn = fns[size][u]; break; } - case 0x13: /* MUL, PMUL */ - assert(u); /* PMUL */ - assert(size == 0); - genfn = gen_helper_neon_mul_p8; - break; case 0x16: /* SQDMULH, SQRDMULH */ { static NeonGenTwoOpEnvFn * const fns[2][2] = { diff --git a/target/arm/translate.c b/target/arm/translate.c index c4dd14e053..4581373e31 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -5007,16 +5007,17 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) case NEON_3R_VMUL: /* VMUL */ if (u) { - /* Polynomial case allows only P8 and is handled below. */ + /* Polynomial case allows only P8. */ if (size != 0) { return 1; } + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size, + 0, gen_helper_gvec_pmul_b); } else { tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size); - return 0; } - break; + return 0; case NEON_3R_VML: /* VMLA, VMLS */ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size, @@ -5206,10 +5207,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) tmp2 = neon_load_reg(rd, pass); gen_neon_add(size, tmp, tmp2); break; - case NEON_3R_VMUL: - /* VMUL.P8; other cases already eliminated. */ - gen_helper_neon_mul_p8(tmp, tmp, tmp2); - break; case NEON_3R_VPMAX: GEN_NEON_INTEGER_OP(pmax); break; diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index fcb3663903..854de0e279 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -1134,3 +1134,33 @@ void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc) } clear_tail(d, opr_sz, simd_maxsz(desc)); } + +/* + * 8x8->8 polynomial multiply. + * + * Polynomial multiplication is like integer multiplication except the + * partial products are XORed, not added. + * + * TODO: expose this as a generic vector operation, as it is a common + * crypto building block. + */ +void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, j, opr_sz = simd_oprsz(desc); + uint64_t *d = vd, *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 8; ++i) { + uint64_t nn = n[i]; + uint64_t mm = m[i]; + uint64_t rr = 0; + + for (j = 0; j < 8; ++j) { + uint64_t mask = (nn & 0x0101010101010101ull) * 0xff; + rr ^= mm & mask; + mm = (mm << 1) & 0xfefefefefefefefeull; + nn >>= 1; + } + d[i] = rr; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +}