From patchwork Wed Sep 26 19:23:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 147655 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp1117161lji; Wed, 26 Sep 2018 12:32:56 -0700 (PDT) X-Google-Smtp-Source: ACcGV63FiaQ+XpkwLfJrPziteAM6z320vw3Fa7ngxKz5XliwW3QT8gdIM+BgWCGt3Aa7dLl6L0d5 X-Received: by 2002:ae9:ea1a:: with SMTP id f26-v6mr5295589qkg.210.1537990376692; Wed, 26 Sep 2018 12:32:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537990376; cv=none; d=google.com; s=arc-20160816; b=fFyD7PfrKoPLZVl0t6Ce4YPHetfZc1QZVpcLg7c8cB+QCP8lNsMIACRJQCNbuDJwxk NdM3g1/LE/6qTX5BrZfqSkLM1SRc12wwMhSn8RmVZMxu8fPQDBX7RTnGnC1ripFe17Y8 7rf4G088Ft2VEFKYRg+xb25jtdeV7muikfcOgWa9hZgDaw7hwqfXP7erlSMuXDK0qqM4 /qH5+t2IeuXmXIRsiPkrQhFfRowlPWtss6JQ/ZO+YjVnSyVSfXah/zQcuMW1Rcd3ZlU/ obGwSwGiLTAgHU4TsOamhKFD3Xe/bAPgavruQzH8DurEU5Pm0ge4hH7T5SYgZNWJ8ggG rnMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature; bh=w34AONdmRqwOXx9FeCuBLX1xr824iQTiCwBqIr/LEPo=; b=Gag1Dn3lzSGyHxZRtkMPms4xpvuyUqo8gWTMf60dHOecVa5hlfe5yTC9t4DQjudjyU wrHJJ/ShKV9bYURkkn+lB0OUTdJNaq4nWUlRIFk5QBwnqnKQcZIKnLXc0zb1zB2+TtzR sVag4pkMUIxxfSFYK8C74nNtABat4q3L6G7/At7rKBeGQKj2fXtvdblDd1WAKZwu7rDh 03Rwx4GRTrQuM0nQbuCA8d54iqc90SEg8jGzzv9+upIOEqKcuEKpYonEPQXJpcNkFGtv OG39AMnoaclBMr2C0KAshnW4V1kDnl1r4ifBrutjM3ohpzI37m68ljYZBS4doj68O8DD JA4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=QtDUpz1s; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id d15-v6si4005015qvk.85.2018.09.26.12.32.56 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 26 Sep 2018 12:32:56 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=QtDUpz1s; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:60401 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g5FYO-0001Vl-3c for patch@linaro.org; Wed, 26 Sep 2018 15:32:56 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38401) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g5FPW-0001tO-Rl for qemu-devel@nongnu.org; Wed, 26 Sep 2018 15:23:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g5FPU-0006rX-R9 for qemu-devel@nongnu.org; Wed, 26 Sep 2018 15:23:46 -0400 Received: from mail-pg1-x52b.google.com ([2607:f8b0:4864:20::52b]:40178) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1g5FPQ-0006pR-Ux for qemu-devel@nongnu.org; Wed, 26 Sep 2018 15:23:42 -0400 Received: by mail-pg1-x52b.google.com with SMTP id n31-v6so60972pgm.7 for ; Wed, 26 Sep 2018 12:23:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=w34AONdmRqwOXx9FeCuBLX1xr824iQTiCwBqIr/LEPo=; b=QtDUpz1sMGqJT8J90bSf9wrT7EkIpxjHKfGVwccchWKcg2rzHyi4V3RZxez/vYyZlu X1JH9Q6SEqXG7cXFiHqiO8+7yVi77wr1l9cDIOUeTVbxQgZcgE14cQoxhCuYee3fC6cv K2vrz09JbBZy/45nhHqsBCDD3gpqLV8KdQHZw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=w34AONdmRqwOXx9FeCuBLX1xr824iQTiCwBqIr/LEPo=; b=temIZKTbLv+Utn/9VGc3TCUbb/GtcIRdDqdnY++2hMMgbcTJCFilGHBpqe3XQC24+e 3w+obgUP5cey+h0kgxzFIKDW77vpcyUgycZnkUkAv4kzNgBPNXUbEmgDrH4HIso3JhFG 4FwRVXHdChZgS3vRgk4U0ZmO4n6MqZFBYWztogPKLRceTpTdLw8ia0VOt/xL+UN4PJTD G/F6nRpWGZS1PfOtbLLELkJBsqSuxY46xrqE4vnN4VoQ9IQyukpeMXAspPC2rqBqjHjn bjCG5w2mEQS8DKrlMC0mwcq1JCdrUxQ89h8jBh/vfbv+gY0LMtQqAyFX2P55ip1dkTuE IN7A== X-Gm-Message-State: ABuFfohGCd8GBHiMKmfWL4yE6Sq5wGspTg265bzXULQX/GzvKzLvB7hf coG2pvI36ITq7FE1PRw9lfcGINMGL54= X-Received: by 2002:a62:8913:: with SMTP id v19-v6mr7698251pfd.127.1537989818304; Wed, 26 Sep 2018 12:23:38 -0700 (PDT) Received: from cloudburst.twiddle.net (97-113-8-179.tukw.qwest.net. [97.113.8.179]) by smtp.gmail.com with ESMTPSA id c2-v6sm8493486pfo.107.2018.09.26.12.23.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 26 Sep 2018 12:23:37 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Wed, 26 Sep 2018 12:23:17 -0700 Message-Id: <20180926192323.12659-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180926192323.12659-1-richard.henderson@linaro.org> References: <20180926192323.12659-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::52b Subject: [Qemu-devel] [PATCH v2 09/15] target/arm: Rewrite helper_sve_st[1234]*_r X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" This fixes the endianness problem for softmmu, and moves the main loop out of a macro and into an inlined function. Reviewed-by: Peter Maydell Tested-by: Laurent Desnogues Signed-off-by: Richard Henderson --- target/arm/sve_helper.c | 351 ++++++++++++++++++++-------------------- 1 file changed, 172 insertions(+), 179 deletions(-) -- 2.17.1 diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index f712b382f8..0b1e06823b 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3991,6 +3991,7 @@ typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host, */ typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off, target_ulong vaddr, int mmu_idx, uintptr_t ra); +typedef sve_ld1_tlb_fn sve_st1_tlb_fn; /* * Generate the above primitives. @@ -4668,214 +4669,206 @@ DO_LDFF1_LDNF1_2(dd, 3, 3) /* * Store contiguous data, protected by a governing predicate. */ -#define DO_ST1(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *vd = &env->vfp.zregs[rd]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - if (pg & 1) { \ - TYPEM m = *(TYPEE *)(vd + H(i)); \ - FN(env, addr, m, ra); \ - } \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += sizeof(TYPEM); \ - } while (i & 15); \ - } \ + +#ifdef CONFIG_SOFTMMU +#define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \ +static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \ + target_ulong addr, int mmu_idx, uintptr_t ra) \ +{ \ + TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \ + TLB(env, addr, *(TYPEM *)(vd + H(reg_off)), oi, ra); \ } - -#define DO_ST1_D(NAME, FN, TYPEM) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc) / 8; \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - uint64_t *d = &env->vfp.zregs[rd].d[0]; \ - uint8_t *pg = vg; \ - for (i = 0; i < oprsz; i += 1) { \ - if (pg[H1(i)] & 1) { \ - FN(env, addr, d[i], ra); \ - } \ - addr += sizeof(TYPEM); \ - } \ +#else +#define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \ +static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \ + target_ulong addr, int mmu_idx, uintptr_t ra) \ +{ \ + HOST(g2h(addr), *(TYPEM *)(vd + H(reg_off))); \ } +#endif -#define DO_ST2(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *d1 = &env->vfp.zregs[rd]; \ - void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - if (pg & 1) { \ - TYPEM m1 = *(TYPEE *)(d1 + H(i)); \ - TYPEM m2 = *(TYPEE *)(d2 + H(i)); \ - FN(env, addr, m1, ra); \ - FN(env, addr + sizeof(TYPEM), m2, ra); \ - } \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += 2 * sizeof(TYPEM); \ - } while (i & 15); \ - } \ -} +DO_ST_TLB(st1bb, H1, uint8_t, stb_p, 0, helper_ret_stb_mmu) +DO_ST_TLB(st1bh, H1_2, uint16_t, stb_p, 0, helper_ret_stb_mmu) +DO_ST_TLB(st1bs, H1_4, uint32_t, stb_p, 0, helper_ret_stb_mmu) +DO_ST_TLB(st1bd, , uint64_t, stb_p, 0, helper_ret_stb_mmu) -#define DO_ST3(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *d1 = &env->vfp.zregs[rd]; \ - void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ - void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - if (pg & 1) { \ - TYPEM m1 = *(TYPEE *)(d1 + H(i)); \ - TYPEM m2 = *(TYPEE *)(d2 + H(i)); \ - TYPEM m3 = *(TYPEE *)(d3 + H(i)); \ - FN(env, addr, m1, ra); \ - FN(env, addr + sizeof(TYPEM), m2, ra); \ - FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \ - } \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += 3 * sizeof(TYPEM); \ - } while (i & 15); \ - } \ -} +DO_ST_TLB(st1hh_le, H1_2, uint16_t, stw_le_p, MO_LE, helper_le_stw_mmu) +DO_ST_TLB(st1hs_le, H1_4, uint32_t, stw_le_p, MO_LE, helper_le_stw_mmu) +DO_ST_TLB(st1hd_le, , uint64_t, stw_le_p, MO_LE, helper_le_stw_mmu) -#define DO_ST4(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *d1 = &env->vfp.zregs[rd]; \ - void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ - void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \ - void *d4 = &env->vfp.zregs[(rd + 3) & 31]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - if (pg & 1) { \ - TYPEM m1 = *(TYPEE *)(d1 + H(i)); \ - TYPEM m2 = *(TYPEE *)(d2 + H(i)); \ - TYPEM m3 = *(TYPEE *)(d3 + H(i)); \ - TYPEM m4 = *(TYPEE *)(d4 + H(i)); \ - FN(env, addr, m1, ra); \ - FN(env, addr + sizeof(TYPEM), m2, ra); \ - FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \ - FN(env, addr + 3 * sizeof(TYPEM), m4, ra); \ - } \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += 4 * sizeof(TYPEM); \ - } while (i & 15); \ - } \ -} +DO_ST_TLB(st1ss_le, H1_4, uint32_t, stl_le_p, MO_LE, helper_le_stl_mmu) +DO_ST_TLB(st1sd_le, , uint64_t, stl_le_p, MO_LE, helper_le_stl_mmu) -DO_ST1(sve_st1bh_r, cpu_stb_data_ra, uint16_t, uint8_t, H1_2) -DO_ST1(sve_st1bs_r, cpu_stb_data_ra, uint32_t, uint8_t, H1_4) -DO_ST1_D(sve_st1bd_r, cpu_stb_data_ra, uint8_t) +DO_ST_TLB(st1dd_le, , uint64_t, stq_le_p, MO_LE, helper_le_stq_mmu) -DO_ST1(sve_st1hs_r, cpu_stw_data_ra, uint32_t, uint16_t, H1_4) -DO_ST1_D(sve_st1hd_r, cpu_stw_data_ra, uint16_t) +DO_ST_TLB(st1hh_be, H1_2, uint16_t, stw_be_p, MO_BE, helper_be_stw_mmu) +DO_ST_TLB(st1hs_be, H1_4, uint32_t, stw_be_p, MO_BE, helper_be_stw_mmu) +DO_ST_TLB(st1hd_be, , uint64_t, stw_be_p, MO_BE, helper_be_stw_mmu) -DO_ST1_D(sve_st1sd_r, cpu_stl_data_ra, uint32_t) +DO_ST_TLB(st1ss_be, H1_4, uint32_t, stl_be_p, MO_BE, helper_be_stl_mmu) +DO_ST_TLB(st1sd_be, , uint64_t, stl_be_p, MO_BE, helper_be_stl_mmu) -DO_ST1(sve_st1bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) -DO_ST2(sve_st2bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) -DO_ST3(sve_st3bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) -DO_ST4(sve_st4bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) +DO_ST_TLB(st1dd_be, , uint64_t, stq_be_p, MO_BE, helper_be_stq_mmu) -DO_ST1(sve_st1hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) -DO_ST2(sve_st2hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) -DO_ST3(sve_st3hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) -DO_ST4(sve_st4hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) +#undef DO_ST_TLB -DO_ST1(sve_st1ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) -DO_ST2(sve_st2ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) -DO_ST3(sve_st3ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) -DO_ST4(sve_st4ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) - -DO_ST1_D(sve_st1dd_r, cpu_stq_data_ra, uint64_t) - -void HELPER(sve_st2dd_r)(CPUARMState *env, void *vg, - target_ulong addr, uint32_t desc) +/* + * Common helpers for all contiguous 1,2,3,4-register predicated stores. + */ +static void sve_st1_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, const uintptr_t ra, + const int esize, const int msize, + sve_st1_tlb_fn *tlb_fn) { - intptr_t i, oprsz = simd_oprsz(desc) / 8; - intptr_t ra = GETPC(); + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); unsigned rd = simd_data(desc); - uint64_t *d1 = &env->vfp.zregs[rd].d[0]; - uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0]; - uint8_t *pg = vg; + void *vd = &env->vfp.zregs[rd]; - for (i = 0; i < oprsz; i += 1) { - if (pg[H1(i)] & 1) { - cpu_stq_data_ra(env, addr, d1[i], ra); - cpu_stq_data_ra(env, addr + 8, d2[i], ra); - } - addr += 2 * 8; + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, vd, i, addr, mmu_idx, ra); + } + i += esize, pg >>= esize; + addr += msize; + } while (i & 15); } + set_helper_retaddr(0); } -void HELPER(sve_st3dd_r)(CPUARMState *env, void *vg, - target_ulong addr, uint32_t desc) +static void sve_st2_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, const uintptr_t ra, + const int esize, const int msize, + sve_st1_tlb_fn *tlb_fn) { - intptr_t i, oprsz = simd_oprsz(desc) / 8; - intptr_t ra = GETPC(); + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); unsigned rd = simd_data(desc); - uint64_t *d1 = &env->vfp.zregs[rd].d[0]; - uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0]; - uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0]; - uint8_t *pg = vg; + void *d1 = &env->vfp.zregs[rd]; + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; - for (i = 0; i < oprsz; i += 1) { - if (pg[H1(i)] & 1) { - cpu_stq_data_ra(env, addr, d1[i], ra); - cpu_stq_data_ra(env, addr + 8, d2[i], ra); - cpu_stq_data_ra(env, addr + 16, d3[i], ra); - } - addr += 3 * 8; + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, d1, i, addr, mmu_idx, ra); + tlb_fn(env, d2, i, addr + msize, mmu_idx, ra); + } + i += esize, pg >>= esize; + addr += 2 * msize; + } while (i & 15); } + set_helper_retaddr(0); } -void HELPER(sve_st4dd_r)(CPUARMState *env, void *vg, - target_ulong addr, uint32_t desc) +static void sve_st3_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, const uintptr_t ra, + const int esize, const int msize, + sve_st1_tlb_fn *tlb_fn) { - intptr_t i, oprsz = simd_oprsz(desc) / 8; - intptr_t ra = GETPC(); + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); unsigned rd = simd_data(desc); - uint64_t *d1 = &env->vfp.zregs[rd].d[0]; - uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0]; - uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0]; - uint64_t *d4 = &env->vfp.zregs[(rd + 3) & 31].d[0]; - uint8_t *pg = vg; + void *d1 = &env->vfp.zregs[rd]; + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; + void *d3 = &env->vfp.zregs[(rd + 2) & 31]; - for (i = 0; i < oprsz; i += 1) { - if (pg[H1(i)] & 1) { - cpu_stq_data_ra(env, addr, d1[i], ra); - cpu_stq_data_ra(env, addr + 8, d2[i], ra); - cpu_stq_data_ra(env, addr + 16, d3[i], ra); - cpu_stq_data_ra(env, addr + 24, d4[i], ra); - } - addr += 4 * 8; + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, d1, i, addr, mmu_idx, ra); + tlb_fn(env, d2, i, addr + msize, mmu_idx, ra); + tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra); + } + i += esize, pg >>= esize; + addr += 3 * msize; + } while (i & 15); } + set_helper_retaddr(0); } +static void sve_st4_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, const uintptr_t ra, + const int esize, const int msize, + sve_st1_tlb_fn *tlb_fn) +{ + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); + unsigned rd = simd_data(desc); + void *d1 = &env->vfp.zregs[rd]; + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; + void *d3 = &env->vfp.zregs[(rd + 2) & 31]; + void *d4 = &env->vfp.zregs[(rd + 3) & 31]; + + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, d1, i, addr, mmu_idx, ra); + tlb_fn(env, d2, i, addr + msize, mmu_idx, ra); + tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra); + tlb_fn(env, d4, i, addr + 3 * msize, mmu_idx, ra); + } + i += esize, pg >>= esize; + addr += 4 * msize; + } while (i & 15); + } + set_helper_retaddr(0); +} + +#define DO_STN_1(N, NAME, ESIZE) \ +void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r) \ + (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \ +{ \ + sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, 1, \ + sve_st1##NAME##_tlb); \ +} + +#define DO_STN_2(N, NAME, ESIZE, MSIZE) \ +void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r) \ + (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \ +{ \ + sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE, \ + arm_cpu_data_is_big_endian(env) \ + ? sve_st1##NAME##_be_tlb : sve_st1##NAME##_le_tlb); \ +} + +DO_STN_1(1, bb, 1) +DO_STN_1(1, bh, 2) +DO_STN_1(1, bs, 4) +DO_STN_1(1, bd, 8) +DO_STN_1(2, bb, 1) +DO_STN_1(3, bb, 1) +DO_STN_1(4, bb, 1) + +DO_STN_2(1, hh, 2, 2) +DO_STN_2(1, hs, 4, 2) +DO_STN_2(1, hd, 8, 2) +DO_STN_2(2, hh, 2, 2) +DO_STN_2(3, hh, 2, 2) +DO_STN_2(4, hh, 2, 2) + +DO_STN_2(1, ss, 4, 4) +DO_STN_2(1, sd, 8, 4) +DO_STN_2(2, ss, 4, 4) +DO_STN_2(3, ss, 4, 4) +DO_STN_2(4, ss, 4, 4) + +DO_STN_2(1, dd, 8, 8) +DO_STN_2(2, dd, 8, 8) +DO_STN_2(3, dd, 8, 8) +DO_STN_2(4, dd, 8, 8) + +#undef DO_STN_1 +#undef DO_STN_2 + /* Loads with a vector index. */ #define DO_LD1_ZPZ_S(NAME, TYPEI, TYPEM, FN) \