[v2,059/100] target/arm: Implement SVE2 XAR

Message ID	20200618042644.1685561-60-richard.henderson@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Subject: [PATCH v2 059/100] target/arm: Implement SVE2 XAR Date: Wed, 17 Jun 2020 21:26:03 -0700 Message-Id: <20200618042644.1685561-60-richard.henderson@linaro.org> In-Reply-To: <20200618042644.1685561-1-richard.henderson@linaro.org> References: <20200618042644.1685561-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::544; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x544.google.com Precedence: list Cc: qemu-arm@nongnu.org, steplong@quicinc.com Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	target/arm: Implement SVE2 \| expand [v2,000/100] target/arm: Implement SVE2 [v2,001/100] tcg: Save/restore vecop_list around minmax fallback [v2,002/100] qemu/int128: Add int128_lshift [v2,003/100] target/arm: Split out gen_gvec_fn_zz [v2,004/100] target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn [v2,005/100] target/arm: Rearrange {sve, fp}_check_access assert [v2,006/100] target/arm: Merge do_vector2_p into do_mov_p [v2,007/100] target/arm: Clean up 4-operand predicate expansion [v2,008/100] target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp [v2,009/100] target/arm: Split out gen_gvec_ool_zzzp [v2,010/100] target/arm: Merge helper_sve_clr_* and helper_sve_movz_* [v2,011/100] target/arm: Split out gen_gvec_ool_zzp [v2,012/100] target/arm: Split out gen_gvec_ool_zzz [v2,013/100] target/arm: Split out gen_gvec_ool_zz [v2,014/100] target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2 [v2,015/100] target/arm: Enable SVE2 and some extensions [v2,016/100] target/arm: Implement SVE2 Integer Multiply - Unpredicated [v2,017/100] target/arm: Implement SVE2 integer pairwise add and accumulate long [v2,018/100] target/arm: Implement SVE2 integer unary operations (predicated) [v2,019/100] target/arm: Split out saturating/rounding shifts from neon [v2,020/100] target/arm: Implement SVE2 saturating/rounding bitwise shift left (predicated) [v2,021/100] target/arm: Implement SVE2 integer halving add/subtract (predicated) [v2,022/100] target/arm: Implement SVE2 integer pairwise arithmetic [v2,023/100] target/arm: Implement SVE2 saturating add/subtract (predicated) [v2,024/100] target/arm: Implement SVE2 integer add/subtract long [v2,025/100] target/arm: Implement SVE2 integer add/subtract interleaved long [v2,026/100] target/arm: Implement SVE2 integer add/subtract wide [v2,027/100] target/arm: Implement SVE2 integer multiply long [v2,028/100] target/arm: Implement PMULLB and PMULLT [v2,029/100] target/arm: Tidy SVE tszimm shift formats [v2,030/100] target/arm: Implement SVE2 bitwise shift left long [v2,031/100] target/arm: Implement SVE2 bitwise exclusive-or interleaved [v2,032/100] target/arm: Implement SVE2 bitwise permute [v2,033/100] target/arm: Implement SVE2 complex integer add [v2,034/100] target/arm: Implement SVE2 integer absolute difference and accumulate long [v2,035/100] target/arm: Implement SVE2 integer add/subtract long with carry [v2,036/100] target/arm: Implement SVE2 bitwise shift right and accumulate [v2,037/100] target/arm: Implement SVE2 bitwise shift and insert [v2,038/100] target/arm: Implement SVE2 integer absolute difference and accumulate [v2,039/100] target/arm: Implement SVE2 saturating extract narrow [v2,040/100] target/arm: Implement SVE2 floating-point pairwise [v2,041/100] target/arm: Implement SVE2 SHRN, RSHRN [v2,042/100] target/arm: Implement SVE2 SQSHRUN, SQRSHRUN [v2,043/100] target/arm: Implement SVE2 UQSHRN, UQRSHRN [v2,044/100] target/arm: Implement SVE2 SQSHRN, SQRSHRN [v2,045/100] target/arm: Implement SVE2 WHILEGT, WHILEGE, WHILEHI, WHILEHS [v2,046/100] target/arm: Implement SVE2 WHILERW, WHILEWR [v2,047/100] target/arm: Implement SVE2 bitwise ternary operations [v2,048/100] target/arm: Implement SVE2 MATCH, NMATCH [v2,049/100] target/arm: Implement SVE2 saturating multiply-add long [v2,050/100] target/arm: Generalize inl_qrdmlah_* helper functions [v2,051/100] target/arm: Implement SVE2 saturating multiply-add high [v2,052/100] target/arm: Implement SVE2 integer multiply-add long [v2,053/100] target/arm: Implement SVE2 complex integer multiply-add [v2,054/100] target/arm: Implement SVE2 ADDHNB, ADDHNT [v2,055/100] target/arm: Implement SVE2 RADDHNB, RADDHNT [v2,056/100] target/arm: Implement SVE2 SUBHNB, SUBHNT [v2,057/100] target/arm: Implement SVE2 RSUBHNB, RSUBHNT [v2,058/100] target/arm: Implement SVE2 HISTCNT, HISTSEG [v2,059/100] target/arm: Implement SVE2 XAR [v2,060/100] target/arm: Implement SVE2 scatter store insns [v2,061/100] target/arm: Implement SVE2 gather load insns [v2,062/100] target/arm: Implement SVE2 FMMLA [v2,063/100] target/arm: Implement SVE2 SPLICE, EXT [v2,064/100] target/arm: Fix sve_uzp_p vs odd vector lengths [v2,065/100] target/arm: Fix sve_zip_p vs odd vector lengths [v2,066/100] target/arm: Fix sve_punpk_p vs odd vector lengths [v2,067/100] target/arm: Pass separate addend to {U, S}DOT helpers [v2,068/100] target/arm: Pass separate addend to FCMLA helpers [v2,069/100] target/arm: Split out formats for 2 vectors + 1 index [v2,070/100] target/arm: Split out formats for 3 vectors + 1 index [v2,071/100] target/arm: Implement SVE2 integer multiply (indexed) [v2,072/100] target/arm: Use helper_gvec_mul_idx_* for aa64 advsimd [v2,073/100] target/arm: Implement SVE2 integer multiply-add (indexed) [v2,074/100] target/arm: Use helper_gvec_ml{a, s}_idx_* for aa64 advsimd [v2,075/100] target/arm: Implement SVE2 saturating multiply-add high (indexed) [v2,076/100] target/arm: Implement SVE2 saturating multiply-add (indexed) [v2,077/100] target/arm: Implement SVE2 integer multiply long (indexed) [v2,078/100] target/arm: Implement SVE2 saturating multiply (indexed) [v2,079/100] target/arm: Implement SVE2 signed saturating doubling multiply high [v2,080/100] target/arm: Use helper_neon_sq{, r}dmul_* for aa64 advsimd [v2,081/100] target/arm: Implement SVE2 saturating multiply high (indexed) [v2,082/100] target/arm: Implement SVE2 multiply-add long (indexed) [v2,083/100] target/arm: Implement SVE2 complex integer multiply-add (indexed) [v2,084/100] target/arm: Implement SVE mixed sign dot product (indexed) [v2,085/100] target/arm: Implement SVE mixed sign dot product [v2,086/100] target/arm: Implement SVE2 crypto unary operations [v2,087/100] target/arm: Implement SVE2 crypto destructive binary operations [v2,088/100] target/arm: Implement SVE2 crypto constructive binary operations [v2,089/100] target/arm: Implement SVE2 TBL, TBX [v2,090/100] target/arm: Implement SVE2 FCVTNT [v2,091/100] target/arm: Implement SVE2 FCVTLT [v2,092/100] target/arm: Implement SVE2 FCVTXNT, FCVTX [v2,093/100] softfloat: Add float16_is_normal [v2,094/100] target/arm: Implement SVE2 FLOGB [v2,095/100] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem [v2,096/100] target/arm: Share table of sve load functions [v2,097/100] target/arm: Implement SVE2 LD1RO [v2,098/100] target/arm: Implement 128-bit ZIP, UZP, TRN [v2,099/100] target/arm: Implement SVE2 bitwise shift immediate [v2,100/100] target/arm: Implement SVE2 fp multiply-add long

Message ID

20200618042644.1685561-60-richard.henderson@linaro.org

State

Superseded

Headers

Received-SPF: pass (google.com: domain of
	qemu-devel-bounces+patch=linaro.org@nongnu.org designates
	209.51.188.17 as permitted sender) client-ip=209.51.188.17; 
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Subject: [PATCH v2 059/100] target/arm: Implement SVE2 XAR
Date: Wed, 17 Jun 2020 21:26:03 -0700
Message-Id: <20200618042644.1685561-60-richard.henderson@linaro.org>
In-Reply-To: <20200618042644.1685561-1-richard.henderson@linaro.org>
References: <20200618042644.1685561-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::544;
	envelope-from=richard.henderson@linaro.org;
	helo=mail-pg1-x544.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
	DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
	SPF_PASS=-0.001 autolearn=_AUTOLEARN
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: qemu-arm@nongnu.org, steplong@quicinc.com
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

Series

target/arm: Implement SVE2 | expand

Commit Message

Richard Henderson June 18, 2020, 4:26 a.m. UTC

In addition, use the same vector generator interface for AdvSIMD.
This fixes a bug in which the AdvSIMD insn failed to clear the
high bits of the SVE register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

---
 target/arm/helper-sve.h    |   4 ++
 target/arm/helper.h        |   2 +
 target/arm/translate-a64.h |   3 ++
 target/arm/sve.decode      |   4 ++
 target/arm/sve_helper.c    |  39 ++++++++++++++
 target/arm/translate-a64.c |  25 ++-------
 target/arm/translate-sve.c | 104 +++++++++++++++++++++++++++++++++++++
 target/arm/vec_helper.c    |  12 +++++
 8 files changed, 172 insertions(+), 21 deletions(-)

-- 
2.25.1

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 1d5d272c5c..9f6095c884 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -2070,6 +2070,10 @@  DEF_HELPER_FLAGS_5(sve2_histcnt_d, TCG_CALL_NO_RWG,
 
 DEF_HELPER_FLAGS_4(sve2_histseg, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve2_xar_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_xar_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_xar_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 643fc3a017..7a29194052 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -783,6 +783,8 @@  DEF_HELPER_FLAGS_4(gvec_uaba_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_uaba_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_uaba_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(gvec_xar_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index da0f59a2ce..e54e297c90 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -117,5 +117,8 @@  bool disas_sve(DisasContext *, uint32_t);
 
 void gen_gvec_rax1(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
                    uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_xar(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                  uint32_t rm_ofs, int64_t shift,
+                  uint32_t opr_sz, uint32_t max_sz);
 
 #endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 0edb72d4fb..a375ce31f1 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -65,6 +65,7 @@ 
 &rr_dbm         rd rn dbm
 &rrri           rd rn rm imm
 &rri_esz        rd rn imm esz
+&rrri_esz       rd rn rm imm esz
 &rrr_esz        rd rn rm esz
 &rpr_esz        rd pg rn esz
 &rpr_s          rd pg rn s
@@ -384,6 +385,9 @@  ORR_zzz         00000100 01 1 ..... 001 100 ..... .....         @rd_rn_rm_e0
 EOR_zzz         00000100 10 1 ..... 001 100 ..... .....         @rd_rn_rm_e0
 BIC_zzz         00000100 11 1 ..... 001 100 ..... .....         @rd_rn_rm_e0
 
+XAR             00000100 .. 1 ..... 001 101 rm:5  rd:5   &rrri_esz \
+                rn=%reg_movprfx esz=%tszimm16_esz imm=%tszimm16_shr
+
 # SVE2 bitwise ternary operations
 EOR3            00000100 00 1 ..... 001 110 ..... .....         @rdn_ra_rm_e0
 BSL             00000100 00 1 ..... 001 111 ..... .....         @rdn_ra_rm_e0
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index bc1c3ce1f0..a6c5ff8f79 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -6784,3 +6784,42 @@  void HELPER(sve2_histseg)(void *vd, void *vn, void *vm, uint32_t desc)
         *(uint64_t *)(vd + i + 8) = out1;
     }
 }
+
+void HELPER(sve2_xar_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    int shr = simd_data(desc);
+    int shl = 8 - shr;
+    uint64_t mask = dup_const(MO_8, 0xff >> shr);
+    uint64_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        uint64_t t = n[i] ^ m[i];
+        d[i] = ((t >> shr) & mask) | ((t << shl) & ~mask);
+    }
+}
+
+void HELPER(sve2_xar_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    int shr = simd_data(desc);
+    int shl = 16 - shr;
+    uint64_t mask = dup_const(MO_16, 0xffff >> shr);
+    uint64_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        uint64_t t = n[i] ^ m[i];
+        d[i] = ((t >> shr) & mask) | ((t << shl) & ~mask);
+    }
+}
+
+void HELPER(sve2_xar_s)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 4;
+    int shr = simd_data(desc);
+    uint32_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        d[i] = ror32(n[i] ^ m[i], shr);
+    }
+}
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b80ee9f734..4f5c433b47 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -13829,8 +13829,6 @@  static void disas_crypto_xar(DisasContext *s, uint32_t insn)
     int imm6 = extract32(insn, 10, 6);
     int rn = extract32(insn, 5, 5);
     int rd = extract32(insn, 0, 5);
-    TCGv_i64 tcg_op1, tcg_op2, tcg_res[2];
-    int pass;
 
     if (!dc_isar_feature(aa64_sha3, s)) {
         unallocated_encoding(s);
@@ -13841,25 +13839,10 @@  static void disas_crypto_xar(DisasContext *s, uint32_t insn)
         return;
     }
 
-    tcg_op1 = tcg_temp_new_i64();
-    tcg_op2 = tcg_temp_new_i64();
-    tcg_res[0] = tcg_temp_new_i64();
-    tcg_res[1] = tcg_temp_new_i64();
-
-    for (pass = 0; pass < 2; pass++) {
-        read_vec_element(s, tcg_op1, rn, pass, MO_64);
-        read_vec_element(s, tcg_op2, rm, pass, MO_64);
-
-        tcg_gen_xor_i64(tcg_res[pass], tcg_op1, tcg_op2);
-        tcg_gen_rotri_i64(tcg_res[pass], tcg_res[pass], imm6);
-    }
-    write_vec_element(s, tcg_res[0], rd, 0, MO_64);
-    write_vec_element(s, tcg_res[1], rd, 1, MO_64);
-
-    tcg_temp_free_i64(tcg_op1);
-    tcg_temp_free_i64(tcg_op2);
-    tcg_temp_free_i64(tcg_res[0]);
-    tcg_temp_free_i64(tcg_res[1]);
+    gen_gvec_xar(MO_64, vec_full_reg_offset(s, rd),
+                 vec_full_reg_offset(s, rn),
+                 vec_full_reg_offset(s, rm), imm6, 16,
+                 vec_full_reg_size(s));
 }
 
 /* Crypto three-reg imm2
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 559250e0d6..640b109166 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -340,6 +340,110 @@  static bool trans_BIC_zzz(DisasContext *s, arg_rrr_esz *a)
     return do_zzz_fn(s, a, tcg_gen_gvec_andc);
 }
 
+static void gen_xar8_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, int64_t sh)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+    uint64_t mask = dup_const(MO_8, 0xff >> sh);
+
+    tcg_gen_xor_i64(t, n, m);
+    tcg_gen_shri_i64(d, t, sh);
+    tcg_gen_shli_i64(t, t, 8 - sh);
+    tcg_gen_andi_i64(d, d, mask);
+    tcg_gen_andi_i64(t, t, ~mask);
+    tcg_gen_or_i64(d, d, t);
+    tcg_temp_free_i64(t);
+}
+
+static void gen_xar16_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, int64_t sh)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+    uint64_t mask = dup_const(MO_16, 0xffff >> sh);
+
+    tcg_gen_xor_i64(t, n, m);
+    tcg_gen_shri_i64(d, t, sh);
+    tcg_gen_shli_i64(t, t, 16 - sh);
+    tcg_gen_andi_i64(d, d, mask);
+    tcg_gen_andi_i64(t, t, ~mask);
+    tcg_gen_or_i64(d, d, t);
+    tcg_temp_free_i64(t);
+}
+
+static void gen_xar_i32(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, int32_t sh)
+{
+    tcg_gen_xor_i32(d, n, m);
+    tcg_gen_rotri_i32(d, d, sh);
+}
+
+static void gen_xar_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, int64_t sh)
+{
+    tcg_gen_xor_i64(d, n, m);
+    tcg_gen_rotri_i64(d, d, sh);
+}
+
+static void gen_xar_vec(unsigned vece, TCGv_vec d, TCGv_vec n,
+                        TCGv_vec m, int64_t sh)
+{
+    tcg_gen_xor_vec(vece, d, n, m);
+    tcg_gen_rotri_vec(vece, d, d, sh);
+}
+
+void gen_gvec_xar(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                  uint32_t rm_ofs, int64_t shift,
+                  uint32_t opr_sz, uint32_t max_sz)
+{
+    static const TCGOpcode vecop[] = { INDEX_op_rotli_vec, 0 };
+    static const GVecGen3i ops[4] = {
+        { .fni8 = gen_xar8_i64,
+          .fniv = gen_xar_vec,
+          .fno = gen_helper_sve2_xar_b,
+          .opt_opc = vecop,
+          .vece = MO_8 },
+        { .fni8 = gen_xar16_i64,
+          .fniv = gen_xar_vec,
+          .fno = gen_helper_sve2_xar_h,
+          .opt_opc = vecop,
+          .vece = MO_16 },
+        { .fni4 = gen_xar_i32,
+          .fniv = gen_xar_vec,
+          .fno = gen_helper_sve2_xar_s,
+          .opt_opc = vecop,
+          .vece = MO_32 },
+        { .fni8 = gen_xar_i64,
+          .fniv = gen_xar_vec,
+          .fno = gen_helper_gvec_xar_d,
+          .opt_opc = vecop,
+          .vece = MO_64 }
+    };
+    int esize = 8 << vece;
+
+    /* The SVE2 range is 1 .. esize; the AdvSIMD range is 0 .. esize-1. */
+    tcg_debug_assert(shift >= 0);
+    tcg_debug_assert(shift <= esize);
+    shift &= esize - 1;
+
+    if (shift == 0) {
+        /* xar with no rotate devolves to xor. */
+        tcg_gen_gvec_xor(vece, rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz);
+    } else {
+        tcg_gen_gvec_3i(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz,
+                        shift, &ops[vece]);
+    }
+}
+
+static bool trans_XAR(DisasContext *s, arg_rrri_esz *a)
+{
+    if (a->esz < 0 || !dc_isar_feature(aa64_sve2, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        unsigned vsz = vec_full_reg_size(s);
+        gen_gvec_xar(a->esz, vec_full_reg_offset(s, a->rd),
+                     vec_full_reg_offset(s, a->rn),
+                     vec_full_reg_offset(s, a->rm), a->imm, vsz, vsz);
+    }
+    return true;
+}
+
 static bool do_sve2_zzzz_fn(DisasContext *s, arg_rrrr_esz *a, GVecGen4Fn *fn)
 {
     if (!dc_isar_feature(aa64_sve2, s)) {
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index f016aa7978..3b38cdafb0 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -1698,3 +1698,15 @@  void HELPER(gvec_umulh_d)(void *vd, void *vn, void *vm, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_xar_d)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    int shr = simd_data(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        d[i] = ror64(n[i] ^ m[i], shr);
+    }
+    clear_tail(d, opr_sz * 8, simd_maxsz(desc));
+}

[v2,059/100] target/arm: Implement SVE2 XAR

Commit Message

Patch