From patchwork Sat Jan 6 03:13:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123618 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp130273qgn; Fri, 5 Jan 2018 19:28:18 -0800 (PST) X-Google-Smtp-Source: ACJfBotiL6q+aaxEGcaeREbBFsOxOiat3YzQv2REGDpnFBFzXNAKwiVDS+MsbKMXQv0QIDbniXqr X-Received: by 10.37.187.65 with SMTP id b1mr4715541ybk.172.1515209298812; Fri, 05 Jan 2018 19:28:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209298; cv=none; d=google.com; s=arc-20160816; b=YLxhwOT93SnA4pfP8BAtpGTLPSOgbaStBK4cSqC4oeMg2BpI3fQt2d5R+/FgPpY+a+ 1Sc/BsrCT6Da3RYZ037F7OFNr9spbirxWbNznkf3T4zBXBS5LQiLm8CNWpuhRI9eymgF U7ZsS0iWlu5qxcOTxDjmkVxGz2zg7ai6cwYDzZzVavf07owj3J1DCNFGVgMX0jSwtDc2 h5XJfn+LZJGnQNKo6YVxWyZ61YvBghIRUcypJMcoHp3nj3H2YboB8IanbDaOWqBC11AU OoN3zOxeEYCPiZiIwhkhDJJgisKsxTOh2FTuAoLEJlovDJU0pPJLnX447q7sCTMniV7r 6nPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=UhQ+K/8dMhQ+hc9amTD8O8f/RhDo7kiOfWpd55cwWCk=; b=bDeouPEhMf1h7RiFPIsO+FYgZS/IZ3hfqPCEmoIvaRMPl6C6iCcFAW+cT5Bv3wt93D nMp5bRx1UP4YMtC+cIOScyLmuyAcyieDCCBQ/4XVffRtG3OR42wrsTXZa6POp7KYfaFG jMd8X2s8yl5la0X6E6tqUsGwYcXZV4YXPbVQGTAgAvcisRZ+oALegcYWwhloMsjEshjw PrdG7fhMOnDBYIAONvtzma1vFBDeoWdeno5sjqObA8RRv+NX4YDzKp6gCgrQOgXdg0Hj H8n8DkPgDscO5IyXranr3zSu1y/CxFYw4oKk5qrfIENG9i0253YXRZx+WXss/0lR2BrF JYbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=gJlsR0U0; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id w18si1479021ybl.446.2018.01.05.19.28.18 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:28:18 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=gJlsR0U0; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44106 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf9e-0001jE-6x for patch@linaro.org; Fri, 05 Jan 2018 22:28:18 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48272) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevu-0007TY-23 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:09 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevp-0004vT-UR for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:06 -0500 Received: from mail-pg0-x241.google.com ([2607:f8b0:400e:c05::241]:32996) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevp-0004sU-Js for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:01 -0500 Received: by mail-pg0-x241.google.com with SMTP id i196so2716454pgd.0 for ; Fri, 05 Jan 2018 19:14:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=UhQ+K/8dMhQ+hc9amTD8O8f/RhDo7kiOfWpd55cwWCk=; b=gJlsR0U0zaWgQnuXqWbXWUL1zWAxcorueTfLJ59KJ2QsZsjnhlFvzukjv0lKsFl+4v +wGt0FtFa0YP03M5pTAHJI6M1t7XT+5j8Rmiummyqc8Az0n2KVs2ODqA4Z9OF1O5EAPq 7VWRHa9rUJ373VGp5lsWl3JpxL2sQ7Vqv82Jo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=UhQ+K/8dMhQ+hc9amTD8O8f/RhDo7kiOfWpd55cwWCk=; b=l2aXavNAYzdLVHmoBU8K/pk9iPJhU5Zzd/g8JnW5Btq4+7zDiuNrS115QFMmyE+N3f 3UPHzdc/ENL3YT7gdYMHLFR5R9N598/iEtq++0zm7nh1P3J1zheAwswik60BShkh+vL0 +57/4b2zTd+/dp2+QAvLR3bgDONmL/VZToHX58Sk0iyFHZI9ZyMk/8uKPMtUthzQybjv nFdvFD05or8X1okq6ZdRIrBuFBaSP1fwdWHJicak9Ke+TnTrYSuomBUBfvi8ZaduBhMJ qAZbYIHBXnoKBTB7cxy+lZBRlM+o6syQ7N4iMiRmoexatGoAvIyZpKoVpohvrKrmuB0s OQow== X-Gm-Message-State: AKGB3mKVSHmQyPkuzPyvNuN6u6dSItsVI3///UFf7hriFXQTnTDLdpoG 3cXKS3ATP6q31vYo5E4pm6OR019ewXY= X-Received: by 10.98.161.21 with SMTP id b21mr4753431pff.113.1515208439925; Fri, 05 Jan 2018 19:13:59 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.13.58 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:13:58 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:29 -0800 Message-Id: <20180106031346.6650-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::241 Subject: [Qemu-devel] [PATCH v8 06/23] tcg: Add generic vector ops for constant shifts X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Opcodes are added for scalar and vector shifts, but considering the varied semantics of these do not expose them to the front ends. Do go ahead and provide them in case they are needed for backend expansion. Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 15 ++ tcg/tcg-op-gvec.h | 35 +++++ tcg/tcg-op.h | 5 + tcg/tcg-opc.h | 12 ++ tcg/tcg.h | 3 + accel/tcg/tcg-runtime-gvec.c | 149 ++++++++++++++++++++ tcg/tcg-op-gvec.c | 318 ++++++++++++++++++++++++++++++++++++++++++- tcg/tcg-op-vec.c | 45 ++++++ tcg/tcg.c | 12 ++ tcg/README | 29 ++++ 10 files changed, 617 insertions(+), 6 deletions(-) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index c6de749134..cb05a755b8 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -164,6 +164,21 @@ DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_shr8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shr16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shr32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shr64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_sar8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_sar16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_sar32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_sar64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(gvec_zip8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_zip16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_zip32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 1d97e80e18..7ca78853f2 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -95,6 +95,25 @@ typedef struct { bool prefer_i64; } GVecGen2; +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, int64_t); + void (*fni4)(TCGv_i32, TCGv_i32, int32_t); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec, int64_t); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_2 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; + /* Load dest as a 3rd source operand. */ + bool load_dest; +} GVecGen2i; + typedef struct { /* Expand inline as a 64-bit or 32-bit integer. Only one of these will be non-NULL. */ @@ -137,6 +156,8 @@ typedef struct { void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz, const GVecGen2 *); +void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + uint32_t maxsz, int64_t c, const GVecGen2i *); void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, const GVecGen3 *); void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs, @@ -179,6 +200,13 @@ void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t s, uint32_t m, uint16_t x); void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t s, uint32_t m, uint32_t x); void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t s, uint32_t m, uint64_t x); +void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift); +void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift); +void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift); + void tcg_gen_gvec_zipl(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); void tcg_gen_gvec_ziph(unsigned vece, uint32_t dofs, uint32_t aofs, @@ -209,3 +237,10 @@ void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); + +void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); +void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); +void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); +void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); +void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); +void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 87be11a90d..c002523bea 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -927,6 +927,11 @@ void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a); void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a); + +void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i); +void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i); +void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i); + void tcg_gen_zipl_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_ziph_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_uzpe_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index c911d62442..a085fc077b 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -229,6 +229,18 @@ DEF(andc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec)) DEF(orc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec)) DEF(not_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec)) +DEF(shli_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec)) +DEF(shri_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec)) +DEF(sari_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec)) + +DEF(shls_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec)) +DEF(shrs_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec)) +DEF(sars_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec)) + +DEF(shlv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec)) +DEF(shrv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec)) +DEF(sarv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec)) + DEF(zipl_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_zip_vec)) DEF(ziph_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_zip_vec)) DEF(uzpe_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) diff --git a/tcg/tcg.h b/tcg/tcg.h index d0678ac769..bdfe058c45 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -178,6 +178,9 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_not_vec 0 #define TCG_TARGET_HAS_andc_vec 0 #define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_shi_vec 0 +#define TCG_TARGET_HAS_shs_vec 0 +#define TCG_TARGET_HAS_shv_vec 0 #define TCG_TARGET_HAS_zip_vec 0 #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index 628df811b2..fba62f1192 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -36,6 +36,11 @@ typedef uint16_t vec16 __attribute__((vector_size(16))); typedef uint32_t vec32 __attribute__((vector_size(16))); typedef uint64_t vec64 __attribute__((vector_size(16))); +typedef int8_t svec8 __attribute__((vector_size(16))); +typedef int16_t svec16 __attribute__((vector_size(16))); +typedef int32_t svec32 __attribute__((vector_size(16))); +typedef int64_t svec64 __attribute__((vector_size(16))); + static inline void clear_high(void *d, intptr_t oprsz, uint32_t desc) { intptr_t maxsz = simd_maxsz(desc); @@ -294,6 +299,150 @@ void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc) clear_high(d, oprsz, desc); } +void HELPER(gvec_shl8i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shl16i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shl32i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shl64i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr8i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr16i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr32i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr64i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar8i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(svec8 *)(d + i) = *(svec8 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar16i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(svec16 *)(d + i) = *(svec16 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar32i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(svec32 *)(d + i) = *(svec32 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar64i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(svec64 *)(d + i) = *(svec64 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + /* The size of the alloca in the following is currently bounded to 2k. */ #define DO_ZIP(NAME, TYPE) \ diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index c4ecef0e1e..0ba61dd049 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -528,6 +528,26 @@ static void expand_2_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz, tcg_temp_free_i32(t0); } +static void expand_2i_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + int32_t c, bool load_dest, + void (*fni)(TCGv_i32, TCGv_i32, int32_t)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < oprsz; i += 4) { + tcg_gen_ld_i32(t0, cpu_env, aofs + i); + if (load_dest) { + tcg_gen_ld_i32(t1, cpu_env, dofs + i); + } + fni(t1, t0, c); + tcg_gen_st_i32(t1, cpu_env, dofs + i); + } + tcg_temp_free_i32(t0); + tcg_temp_free_i32(t1); +} + /* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ static void expand_3_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, bool load_dest, @@ -554,7 +574,7 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs, /* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, - uint32_t cofs, uint32_t opsz, + uint32_t cofs, uint32_t oprsz, void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32)) { TCGv_i32 t0 = tcg_temp_new_i32(); @@ -563,7 +583,7 @@ static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, TCGv_i32 t3 = tcg_temp_new_i32(); uint32_t i; - for (i = 0; i < opsz; i += 4) { + for (i = 0; i < oprsz; i += 4) { tcg_gen_ld_i32(t1, cpu_env, aofs + i); tcg_gen_ld_i32(t2, cpu_env, bofs + i); tcg_gen_ld_i32(t3, cpu_env, cofs + i); @@ -591,6 +611,26 @@ static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz, tcg_temp_free_i64(t0); } +static void expand_2i_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + int64_t c, bool load_dest, + void (*fni)(TCGv_i64, TCGv_i64, int64_t)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < oprsz; i += 8) { + tcg_gen_ld_i64(t0, cpu_env, aofs + i); + if (load_dest) { + tcg_gen_ld_i64(t1, cpu_env, dofs + i); + } + fni(t1, t0, c); + tcg_gen_st_i64(t1, cpu_env, dofs + i); + } + tcg_temp_free_i64(t0); + tcg_temp_free_i64(t1); +} + /* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ static void expand_3_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, bool load_dest, @@ -617,7 +657,7 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs, /* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, - uint32_t cofs, uint32_t opsz, + uint32_t cofs, uint32_t oprsz, void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64)) { TCGv_i64 t0 = tcg_temp_new_i64(); @@ -626,7 +666,7 @@ static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, TCGv_i64 t3 = tcg_temp_new_i64(); uint32_t i; - for (i = 0; i < opsz; i += 8) { + for (i = 0; i < oprsz; i += 8) { tcg_gen_ld_i64(t1, cpu_env, aofs + i); tcg_gen_ld_i64(t2, cpu_env, bofs + i); tcg_gen_ld_i64(t3, cpu_env, cofs + i); @@ -655,6 +695,29 @@ static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_temp_free_vec(t0); } +/* Expand OPSZ bytes worth of two-vector operands and an immediate operand + using host vectors. */ +static void expand_2i_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t tysz, TCGType type, + int64_t c, bool load_dest, + void (*fni)(unsigned, TCGv_vec, TCGv_vec, int64_t)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < oprsz; i += tysz) { + tcg_gen_ld_vec(t0, cpu_env, aofs + i); + if (load_dest) { + tcg_gen_ld_vec(t1, cpu_env, dofs + i); + } + fni(vece, t1, t0, c); + tcg_gen_st_vec(t1, cpu_env, dofs + i); + } + tcg_temp_free_vec(t0); + tcg_temp_free_vec(t1); +} + /* Expand OPSZ bytes worth of three-operand operations using host vectors. */ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, @@ -682,7 +745,7 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs, /* Expand OPSZ bytes worth of four-operand operations using host vectors. */ static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs, - uint32_t bofs, uint32_t cofs, uint32_t opsz, + uint32_t bofs, uint32_t cofs, uint32_t oprsz, uint32_t tysz, TCGType type, void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, TCGv_vec)) @@ -693,7 +756,7 @@ static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs, TCGv_vec t3 = tcg_temp_new_vec(type); uint32_t i; - for (i = 0; i < opsz; i += tysz) { + for (i = 0; i < oprsz; i += tysz) { tcg_gen_ld_vec(t1, cpu_env, aofs + i); tcg_gen_ld_vec(t2, cpu_env, bofs + i); tcg_gen_ld_vec(t3, cpu_env, cofs + i); @@ -783,6 +846,85 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, g->data, g->fno); } +void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + uint32_t maxsz, int64_t c, const GVecGen2i *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs); + check_overlap_2(dofs, aofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && g->fniv && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_2i_vec(g->vece, dofs, aofs, done, 32, TCG_TYPE_V256, + c, g->load_dest, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && g->fniv && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_2i_vec(g->vece, dofs, aofs, done, 16, TCG_TYPE_V128, + c, g->load_dest, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && g->fniv && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { + expand_2i_vec(g->vece, dofs, aofs, done, 8, TCG_TYPE_V64, + c, g->load_dest, g->fniv); + } else if (g->fni8) { + expand_2i_i64(dofs, aofs, done, c, g->load_dest, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_2i_i32(dofs, aofs, done, c, g->load_dest, g->fni4); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, c, g->fno); +} + /* Expand a vector three-operand operation. */ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, const GVecGen3 *g) @@ -1402,6 +1544,170 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g); } +void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t mask = ((0xff << c) & 0xff) * (-1ull / 0xff); + tcg_gen_shli_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t mask = ((0xffff << c) & 0xffff) * (-1ull / 0xffff); + tcg_gen_shli_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift) +{ + static const GVecGen2i g[4] = { + { .fni8 = tcg_gen_vec_shl8i_i64, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl8i, + .opc = INDEX_op_shli_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_shl16i_i64, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl16i, + .opc = INDEX_op_shli_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_shli_i32, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl32i, + .opc = INDEX_op_shli_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_shli_i64, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl64i, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_debug_assert(shift < (8 << vece)); + if (shift == 0) { + tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz); + } else { + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, shift, &g[vece]); + } +} + +void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t mask = (0xff >> c) * (-1ull / 0xff); + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t mask = (0xffff >> c) * (-1ull / 0xffff); + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift) +{ + static const GVecGen2i g[4] = { + { .fni8 = tcg_gen_vec_shr8i_i64, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr8i, + .opc = INDEX_op_shri_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_shr16i_i64, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr16i, + .opc = INDEX_op_shri_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_shri_i32, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr32i, + .opc = INDEX_op_shri_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_shri_i64, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr64i, + .opc = INDEX_op_shri_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_debug_assert(shift < (8 << vece)); + if (shift == 0) { + tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz); + } else { + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, shift, &g[vece]); + } +} + +void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t s_mask = (0x80 >> c) * (-1ull / 0xff); + uint64_t c_mask = (0xff >> c) * (-1ull / 0xff); + TCGv_i64 s = tcg_temp_new_i64(); + + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(s, d, s_mask); /* isolate (shifted) sign bit */ + tcg_gen_muli_i64(s, s, (2 << c) - 2); /* replicate isolated signs */ + tcg_gen_andi_i64(d, d, c_mask); /* clear out bits above sign */ + tcg_gen_or_i64(d, d, s); /* include sign extension */ + tcg_temp_free_i64(s); +} + +void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t s_mask = (0x8000 >> c) * (-1ull / 0xffff); + uint64_t c_mask = (0xffff >> c) * (-1ull / 0xffff); + TCGv_i64 s = tcg_temp_new_i64(); + + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(s, d, s_mask); /* isolate (shifted) sign bit */ + tcg_gen_andi_i64(d, d, c_mask); /* clear out bits above sign */ + tcg_gen_muli_i64(s, s, (2 << c) - 2); /* replicate isolated signs */ + tcg_gen_or_i64(d, d, s); /* include sign extension */ + tcg_temp_free_i64(s); +} + +void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift) +{ + static const GVecGen2i g[4] = { + { .fni8 = tcg_gen_vec_sar8i_i64, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar8i, + .opc = INDEX_op_sari_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_sar16i_i64, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar16i, + .opc = INDEX_op_sari_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_sari_i32, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar32i, + .opc = INDEX_op_sari_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_sari_i64, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar64i, + .opc = INDEX_op_sari_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_debug_assert(shift < (8 << vece)); + if (shift == 0) { + tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz); + } else { + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, shift, &g[vece]); + } +} + static void do_zip(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, bool high) diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index a5d0ff89c3..b54d3de19f 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -381,6 +381,51 @@ void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a) } } +static void do_shifti(TCGOpcode opc, unsigned vece, + TCGv_vec r, TCGv_vec a, int64_t i) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGType type = rt->base_type; + int can; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(i >= 0 && i < (8 << vece)); + + if (i == 0) { + tcg_gen_mov_vec(r, a); + return; + } + + can = tcg_can_emit_vec_op(opc, type, vece); + if (can > 0) { + vec_gen_3(opc, type, vece, ri, ai, i); + } else { + /* We leave the choice of expansion via scalar or vector shift + to the target. Often, but not always, dupi can feed a vector + shift easier than a scalar. */ + tcg_debug_assert(can < 0); + tcg_expand_vec_op(opc, type, vece, ri, ai, i); + } +} + +void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i) +{ + do_shifti(INDEX_op_shli_vec, vece, r, a, i); +} + +void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i) +{ + do_shifti(INDEX_op_shri_vec, vece, r, a, i); +} + +void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i) +{ + do_shifti(INDEX_op_sari_vec, vece, r, a, i); +} + static void do_interleave(TCGOpcode opc, unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) { diff --git a/tcg/tcg.c b/tcg/tcg.c index 33e3c03cbc..f82d6e80b0 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1403,6 +1403,18 @@ bool tcg_op_supported(TCGOpcode op) return have_vec && TCG_TARGET_HAS_andc_vec; case INDEX_op_orc_vec: return have_vec && TCG_TARGET_HAS_orc_vec; + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + case INDEX_op_sari_vec: + return have_vec && TCG_TARGET_HAS_shi_vec; + case INDEX_op_shls_vec: + case INDEX_op_shrs_vec: + case INDEX_op_sars_vec: + return have_vec && TCG_TARGET_HAS_shs_vec; + case INDEX_op_shlv_vec: + case INDEX_op_shrv_vec: + case INDEX_op_sarv_vec: + return have_vec && TCG_TARGET_HAS_shv_vec; case INDEX_op_zipl_vec: case INDEX_op_ziph_vec: return have_vec && TCG_TARGET_HAS_zip_vec; diff --git a/tcg/README b/tcg/README index 8ab8d3ab7e..75db47922d 100644 --- a/tcg/README +++ b/tcg/README @@ -561,6 +561,35 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. Similarly, logical operations with and without compliment. Note that VECE is unused. +* shli_vec v0, v1, i2 +* shls_vec v0, v1, s2 + + Shift all elements from v1 by a scalar i2/s2. I.e. + + for (i = 0; i < VECL/VECE; ++i) { + v0[i] = v1[i] << s2; + } + +* shri_vec v0, v1, i2 +* sari_vec v0, v1, i2 +* shrs_vec v0, v1, s2 +* sars_vec v0, v1, s2 + + Similarly for logical and arithmetic right shift. + +* shlv_vec v0, v1, v2 + + Shift elements from v1 by elements from v2. I.e. + + for (i = 0; i < VECL/VECE; ++i) { + v0[i] = v1[i] << v2[i]; + } + +* shrv_vec v0, v1, v2 +* sarv_vec v0, v1, v2 + + Similarly for logical and arithmetic right shift. + * zipl_vec v0, v1, v2 * ziph_vec v0, v1, v2