From patchwork Tue Mar 19 17:21:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 160609 Delivered-To: patch@linaro.org Received: by 2002:a02:5cc1:0:0:0:0:0 with SMTP id w62csp4147052jad; Tue, 19 Mar 2019 10:35:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqyeGR5DzIqazE5rGcvpYUyIFx02H0keSqFNU4d34ZJeaVct09WQVqcLLQwNexjYHiD+S4Cr X-Received: by 2002:a1c:c6ce:: with SMTP id w197mr3818396wmf.95.1553016912184; Tue, 19 Mar 2019 10:35:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553016912; cv=none; d=google.com; s=arc-20160816; b=DGweV5n36mTVeYFqp1MjqjlEpWDwDnhPx/8aR+lWo8T/p/8stjfrAqJhbvyfLFe/ju G6329LD11qWEQTlYzuD3kfBlp9Z/nhe2f9mFWdjhlg1sufFH+WTi5eTQj7FEqXggxBI0 mwf+fdkWTO7xyN2iJ4AduQvvpxmBdxbLxuHQ+FhW9jTfTSKHeUdOorXTJYclCVyopLIi /y9O52M3p/YPwprnVDzypTF211Fm17OTy5tn9b9CEu7NTewdzn/NRghRbU9sfPRlFPc/ jjHrn1NZ7COKB3VhGooD+xusq09aMWwwjykzvvy/LmXaHgmF/pzogVfTyKwUkjPMyBfX QjmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature; bh=fjX1Uwow3OOAygBbdL2Oek0cPoeEy3JZN38iqGTZCrk=; b=meJI5WYrFKMmXgKXIZT83VZjMsYveF4VOCrn/ev62gdfXbdps9PRpa9Pv4kUgML5dE xfwI5INSUw4VdlqlEqHlVwmWVn2V/eZ/E3Dvfxr6RfjKVSZO1ahxF4n9Vvv7/RXQudZg TRpSj/0P75hNDNWBuQY2C68ac/YzWK6ncEBGZcVSG18JOuZ3gcTaE1YkvX8hSFR6EWyu eZPTPpRniEUOJXM+aUpDVofAPO/QuRU9rujeEtSDuW/ujxDME/Z9UBI0IQIbOeGxhdKq vnzclpxEL45fmNoApexMquaEzrKqKkl+q5SPyQkU0HxKLhhuTVbCamhpv6nobbBPkIAM zcFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=YMvBkpZ9; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id j9si1991165wmh.164.2019.03.19.10.35.11 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 19 Mar 2019 10:35:12 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=YMvBkpZ9; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([127.0.0.1]:60830 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h6Idr-0005HF-1S for patch@linaro.org; Tue, 19 Mar 2019 13:35:11 -0400 Received: from eggs.gnu.org ([209.51.188.92]:49261) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h6IR2-0004oC-9u for qemu-devel@nongnu.org; Tue, 19 Mar 2019 13:22:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h6IQv-0004eq-0d for qemu-devel@nongnu.org; Tue, 19 Mar 2019 13:21:56 -0400 Received: from mail-pf1-x442.google.com ([2607:f8b0:4864:20::442]:39310) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1h6IQu-0004df-79 for qemu-devel@nongnu.org; Tue, 19 Mar 2019 13:21:48 -0400 Received: by mail-pf1-x442.google.com with SMTP id i17so4400014pfo.6 for ; Tue, 19 Mar 2019 10:21:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=fjX1Uwow3OOAygBbdL2Oek0cPoeEy3JZN38iqGTZCrk=; b=YMvBkpZ9mOOyMhgyWaCvsFVEg3nPA5ttiS3dePa2Gtmrn6Tju3Y5KkxF5liHn09Dp/ kkIXHWb/TJTRe+DT1LXdWp+V/2AGak3FsZhqxDpYonF+4FHwFygUgmMhJwIqbHfmMe7k FIA2Hi5H2Nk4rFeIIQc+5rQUZ7y0xjMO00e26HCB7oNwkaCbU0Y7r6zkqL/qahyr8ZA2 2Z+KyBqm/66Ue6mmT21m7VbJkZ9Ty4v8jF+UHdh3+EUFVuMsnLaIPBZBgCeWQc23MqGX 4gOtjj5z/3nkP0+bw0qvNpVrWRPQy31vH1H+BRqvfSYPAkLM1CmdJs5GmzQzs9lMRKF7 0E1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=fjX1Uwow3OOAygBbdL2Oek0cPoeEy3JZN38iqGTZCrk=; b=lv6A+QCGFW5AqygHFc+KyiQhV5T4f58CXwWMLJn+wTAQbRdROjkqnUCmdun85rOvK7 J3+U4FaOWmUoRIUq9nzHmlAmIxIZuh+97DO1GG0uDb3jptHvi0QJmqk8UO+/GT1/DQfb pmbja/nllWmPmcVqtE0rfI1ifQC0sRBOD4HRloW+vbb8CargvWBRoNaHx/qVlPOz2je4 RIdvm9EGYxt2fZmACWmBUYk043tPV+leE1qUgqSY/rux+NcjDX59h/cGtmwFr2Vgix9k Pq6qrDm5cQqN00Ug1OzH0aLgOPyNmItJwXryjTD9npJALbaniStcViMeDFugZ9JJi4zT PdYA== X-Gm-Message-State: APjAAAVjM0R9JvdNNGMu8X2vBKP70ThyCbkFEQEl+REwKXwiVP7nHOly lEhYbdj2E1U5ycAIwyNcXK2D47nLBA8= X-Received: by 2002:a62:3996:: with SMTP id u22mr20084259pfj.28.1553016106328; Tue, 19 Mar 2019 10:21:46 -0700 (PDT) Received: from cloudburst.twiddle.net (97-113-188-82.tukw.qwest.net. [97.113.188.82]) by smtp.gmail.com with ESMTPSA id w68sm5616666pfb.176.2019.03.19.10.21.44 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 19 Mar 2019 10:21:45 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 19 Mar 2019 10:21:21 -0700 Message-Id: <20190319172126.7502-13-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20190319172126.7502-1-richard.henderson@linaro.org> References: <20190319172126.7502-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::442 Subject: [Qemu-devel] [PATCH for-4.1 v3 12/17] tcg/ppc: Initial backend support for Altivec X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.cave-ayland@ilande.co.uk, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" There are a few missing operations yet, like expansion of multiply and shifts. But this has move, load, store, and basic arithmetic. Signed-off-by: Richard Henderson --- tcg/ppc/tcg-target.h | 33 +- tcg/ppc/tcg-target.opc.h | 3 + tcg/ppc/tcg-target.inc.c | 721 +++++++++++++++++++++++++++++++++++---- 3 files changed, 685 insertions(+), 72 deletions(-) create mode 100644 tcg/ppc/tcg-target.opc.h -- 2.17.2 diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h index 52c1bb04b1..3f669de7a7 100644 --- a/tcg/ppc/tcg-target.h +++ b/tcg/ppc/tcg-target.h @@ -31,7 +31,7 @@ # define TCG_TARGET_REG_BITS 32 #endif -#define TCG_TARGET_NB_REGS 32 +#define TCG_TARGET_NB_REGS 64 #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16 @@ -45,10 +45,20 @@ typedef enum { TCG_REG_R24, TCG_REG_R25, TCG_REG_R26, TCG_REG_R27, TCG_REG_R28, TCG_REG_R29, TCG_REG_R30, TCG_REG_R31, + TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, + TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, + TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11, + TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, + TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, + TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, + TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, + TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31, + TCG_REG_CALL_STACK = TCG_REG_R1, TCG_AREG0 = TCG_REG_R27 } TCGReg; +extern bool have_isa_altivec; extern bool have_isa_2_06; extern bool have_isa_3_00; @@ -124,6 +134,27 @@ extern bool have_isa_3_00; #define TCG_TARGET_HAS_mulsh_i64 1 #endif +/* + * While technically Altivec could support V64, it has no 64-bit store + * instruction, which makes the generate code quite large. + */ +#define TCG_TARGET_HAS_v64 0 +#define TCG_TARGET_HAS_v128 have_isa_altivec +#define TCG_TARGET_HAS_v256 0 + +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_not_vec 1 +#define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_shi_vec 0 +#define TCG_TARGET_HAS_shs_vec 0 +#define TCG_TARGET_HAS_shv_vec 0 +#define TCG_TARGET_HAS_cmp_vec 1 +#define TCG_TARGET_HAS_mul_vec 0 +#define TCG_TARGET_HAS_sat_vec 1 +#define TCG_TARGET_HAS_minmax_vec 1 +#define TCG_TARGET_HAS_dupm_vec 1 + void flush_icache_range(uintptr_t start, uintptr_t stop); void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t); diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h new file mode 100644 index 0000000000..4816a6c3d4 --- /dev/null +++ b/tcg/ppc/tcg-target.opc.h @@ -0,0 +1,3 @@ +/* Target-specific opcodes for host vector expansion. These will be + emitted by tcg_expand_vec_op. For those familiar with GCC internals, + consider these to be UNSPEC with names. */ diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c index ec8e336be8..70a64dd214 100644 --- a/tcg/ppc/tcg-target.inc.c +++ b/tcg/ppc/tcg-target.inc.c @@ -42,6 +42,9 @@ # define TCG_REG_TMP1 TCG_REG_R12 #endif +#define TCG_VEC_TMP1 TCG_REG_V0 +#define TCG_VEC_TMP2 TCG_REG_V1 + #define TCG_REG_TB TCG_REG_R31 #define USE_REG_TB (TCG_TARGET_REG_BITS == 64) @@ -61,6 +64,7 @@ static tcg_insn_unit *tb_ret_addr; +bool have_isa_altivec; bool have_isa_2_06; bool have_isa_3_00; @@ -72,39 +76,15 @@ bool have_isa_3_00; #endif #ifdef CONFIG_DEBUG_TCG -static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { - "r0", - "r1", - "r2", - "r3", - "r4", - "r5", - "r6", - "r7", - "r8", - "r9", - "r10", - "r11", - "r12", - "r13", - "r14", - "r15", - "r16", - "r17", - "r18", - "r19", - "r20", - "r21", - "r22", - "r23", - "r24", - "r25", - "r26", - "r27", - "r28", - "r29", - "r30", - "r31" +static const char tcg_target_reg_names[TCG_TARGET_NB_REGS][4] = { + "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", + "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", + "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", + "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31", + "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", + "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", + "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", + "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", }; #endif @@ -139,6 +119,26 @@ static const int tcg_target_reg_alloc_order[] = { TCG_REG_R5, TCG_REG_R4, TCG_REG_R3, + + /* V0 and V1 reserved as temporaries; V20 - V31 are call-saved */ + TCG_REG_V2, /* call clobbered, vectors */ + TCG_REG_V3, + TCG_REG_V4, + TCG_REG_V5, + TCG_REG_V6, + TCG_REG_V7, + TCG_REG_V8, + TCG_REG_V9, + TCG_REG_V10, + TCG_REG_V11, + TCG_REG_V12, + TCG_REG_V13, + TCG_REG_V14, + TCG_REG_V15, + TCG_REG_V16, + TCG_REG_V17, + TCG_REG_V18, + TCG_REG_V19, }; static const int tcg_target_call_iarg_regs[] = { @@ -233,6 +233,10 @@ static const char *target_parse_constraint(TCGArgConstraint *ct, ct->ct |= TCG_CT_REG; ct->u.regs = 0xffffffff; break; + case 'v': + ct->ct |= TCG_CT_REG; + ct->u.regs = 0xffffffff00000000ull; + break; case 'L': /* qemu_ld constraint */ ct->ct |= TCG_CT_REG; ct->u.regs = 0xffffffff; @@ -320,6 +324,7 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type, #define XO31(opc) (OPCD(31)|((opc)<<1)) #define XO58(opc) (OPCD(58)|(opc)) #define XO62(opc) (OPCD(62)|(opc)) +#define VX4(opc) (OPCD(4)|(opc)) #define B OPCD( 18) #define BC OPCD( 16) @@ -461,6 +466,72 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type, #define NOP ORI /* ori 0,0,0 */ +#define LVX XO31(103) +#define LVEBX XO31(7) +#define LVEHX XO31(39) +#define LVEWX XO31(71) + +#define STVX XO31(231) +#define STVEWX XO31(199) + +#define VADDSBS VX4(768) +#define VADDUBS VX4(512) +#define VADDUBM VX4(0) +#define VADDSHS VX4(832) +#define VADDUHS VX4(576) +#define VADDUHM VX4(64) +#define VADDSWS VX4(896) +#define VADDUWS VX4(640) +#define VADDUWM VX4(128) + +#define VSUBSBS VX4(1792) +#define VSUBUBS VX4(1536) +#define VSUBUBM VX4(1024) +#define VSUBSHS VX4(1856) +#define VSUBUHS VX4(1600) +#define VSUBUHM VX4(1088) +#define VSUBSWS VX4(1920) +#define VSUBUWS VX4(1664) +#define VSUBUWM VX4(1152) + +#define VMAXSB VX4(258) +#define VMAXSH VX4(322) +#define VMAXSW VX4(386) +#define VMAXUB VX4(2) +#define VMAXUH VX4(66) +#define VMAXUW VX4(130) +#define VMINSB VX4(770) +#define VMINSH VX4(834) +#define VMINSW VX4(898) +#define VMINUB VX4(514) +#define VMINUH VX4(578) +#define VMINUW VX4(642) + +#define VCMPEQUB VX4(6) +#define VCMPEQUH VX4(70) +#define VCMPEQUW VX4(134) +#define VCMPGTSB VX4(774) +#define VCMPGTSH VX4(838) +#define VCMPGTSW VX4(902) +#define VCMPGTUB VX4(518) +#define VCMPGTUH VX4(582) +#define VCMPGTUW VX4(646) + +#define VAND VX4(1028) +#define VANDC VX4(1092) +#define VNOR VX4(1284) +#define VOR VX4(1156) +#define VXOR VX4(1220) + +#define VSPLTB VX4(524) +#define VSPLTH VX4(588) +#define VSPLTW VX4(652) +#define VSPLTISB VX4(780) +#define VSPLTISH VX4(844) +#define VSPLTISW VX4(908) + +#define VSLDOI VX4(44) + #define RT(r) ((r)<<21) #define RS(r) ((r)<<21) #define RA(r) ((r)<<16) @@ -473,6 +544,11 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type, #define MB64(b) ((b)<<5) #define FXM(b) (1 << (19 - (b))) +#define VRT(r) (((r) & 31) << 21) +#define VRA(r) (((r) & 31) << 16) +#define VRB(r) (((r) & 31) << 11) +#define VRC(r) (((r) & 31) << 6) + #define LK 1 #define TAB(t, a, b) (RT(t) | RA(a) | RB(b)) @@ -530,6 +606,8 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type, { tcg_insn_unit *target; tcg_insn_unit old; + int16_t lo; + int32_t hi; value += addend; target = (tcg_insn_unit *)value; @@ -540,22 +618,31 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type, case R_PPC_REL24: return reloc_pc24(code_ptr, target); case R_PPC_ADDR16: - /* We are abusing this relocation type. This points to a pair - of insns, addis + load. If the displacement is small, we - can nop out the addis. */ + /* + * We are abusing this relocation type. This points to a pair + * of insns, addis + load. If the displacement is small, we + * can nop out the addis. + */ if (value == (int16_t)value) { code_ptr[0] = NOP; old = deposit32(code_ptr[1], 0, 16, value); code_ptr[1] = deposit32(old, 16, 5, TCG_REG_TB); - } else { - int16_t lo = value; - int hi = value - lo; - if (hi + lo != value) { - return false; - } - code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16); - code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo); + break; } + /* fall through */ + case R_PPC_ADDR32: + /* + * We are abusing this relocation type. Again, this points to + * a pair of insns, lis + load. This is an absolute address + * relocation for PPC32 so the lis cannot be removed. + */ + lo = value; + hi = value - lo; + if (hi + lo != value) { + return false; + } + code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16); + code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo); break; default: g_assert_not_reached(); @@ -568,9 +655,29 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt, static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg) { - tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32); - if (ret != arg) { - tcg_out32(s, OR | SAB(arg, ret, arg)); + if (ret == arg) { + return true; + } + switch (type) { + case TCG_TYPE_I64: + tcg_debug_assert(TCG_TARGET_REG_BITS == 64); + /* fallthru */ + case TCG_TYPE_I32: + if (ret < 32 && arg < 32) { + tcg_out32(s, OR | SAB(arg, ret, arg)); + break; + } else if (ret < 32 || arg < 32) { + /* Altivec does not support vector/integer moves. */ + return false; + } + /* fallthru */ + case TCG_TYPE_V64: + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 32 && arg >= 32); + tcg_out32(s, VOR | VRT(ret) | VRA(arg) | VRB(arg)); + break; + default: + g_assert_not_reached(); } return true; } @@ -720,10 +827,73 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret, } } -static inline void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret, - tcg_target_long arg) +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret, + tcg_target_long val) { - tcg_out_movi_int(s, type, ret, arg, false); + uint32_t load_insn; + int rel, base, low; + intptr_t add; + + low = (int8_t)val; + if (low >= -16 && low < 16) { + if (val == (tcg_target_long)dup_const(MO_8, low)) { + tcg_out32(s, VSPLTISB | VRT(ret) | ((val & 31) << 16)); + return; + } + if (val == (tcg_target_long)dup_const(MO_16, low)) { + tcg_out32(s, VSPLTISH | VRT(ret) | ((val & 31) << 16)); + return; + } + if (val == (tcg_target_long)dup_const(MO_32, low)) { + tcg_out32(s, VSPLTISW | VRT(ret) | ((val & 31) << 16)); + return; + } + } + + /* + * Otherwise we must load the value from the constant pool. + */ + if (USE_REG_TB) { + base = TCG_REG_TB; + rel = R_PPC_ADDR16; + add = -(intptr_t)s->code_gen_ptr; + } else { + base = 0; + rel = R_PPC_ADDR32; + add = 0; + } + + load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1); + if (TCG_TARGET_REG_BITS == 64) { + new_pool_l2(s, rel, s->code_ptr, add, val, val); + } else { + new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val); + } + + tcg_out32(s, ADDIS | TAI(TCG_REG_TMP1, base, 0)); + tcg_out32(s, ADDI | TAI(TCG_REG_TMP1, TCG_REG_TMP1, 0)); + tcg_out32(s, load_insn); +} + +static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret, + tcg_target_long arg) +{ + switch (type) { + case TCG_TYPE_I32: + case TCG_TYPE_I64: + tcg_debug_assert(ret < 32); + tcg_out_movi_int(s, type, ret, arg, false); + break; + + case TCG_TYPE_V64: + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 32); + tcg_out_dupi_vec(s, type, ret, arg); + break; + + default: + g_assert_not_reached(); + } } static bool mask_operand(uint32_t c, int *mb, int *me) @@ -876,7 +1046,7 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt, } /* For unaligned, or very large offsets, use the indexed form. */ - if (offset & align || offset != (int32_t)offset) { + if (offset & align || offset != (int32_t)offset || opi == 0) { if (rs == base) { rs = TCG_REG_R0; } @@ -907,32 +1077,96 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt, } } -static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, - TCGReg arg1, intptr_t arg2) +static void tcg_out_vsldoi(TCGContext *s, TCGReg ret, + TCGReg va, TCGReg vb, int shb) { - int opi, opx; - - tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32); - if (type == TCG_TYPE_I32) { - opi = LWZ, opx = LWZX; - } else { - opi = LD, opx = LDX; - } - tcg_out_mem_long(s, opi, opx, ret, arg1, arg2); + tcg_out32(s, VSLDOI | VRT(ret) | VRA(va) | VRB(vb) | (shb << 6)); } -static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, + TCGReg base, intptr_t offset) { - int opi, opx; + int shift; - tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32); - if (type == TCG_TYPE_I32) { - opi = STW, opx = STWX; - } else { - opi = STD, opx = STDX; + switch (type) { + case TCG_TYPE_I32: + if (ret < 32) { + tcg_out_mem_long(s, LWZ, LWZX, ret, base, offset); + break; + } + assert((offset & 3) == 0); + tcg_out_mem_long(s, 0, LVEWX, ret & 31, base, offset); + shift = (offset - 4) & 0xc; + if (shift) { + tcg_out_vsldoi(s, ret, ret, ret, shift); + } + break; + case TCG_TYPE_I64: + if (ret < 32) { + tcg_out_mem_long(s, LD, LDX, ret, base, offset); + break; + } + /* fallthru */ + case TCG_TYPE_V64: + tcg_debug_assert(ret >= 32); + assert((offset & 7) == 0); + tcg_out_mem_long(s, 0, LVX, ret & 31, base, offset & -16); + if (offset & 8) { + tcg_out_vsldoi(s, ret, ret, ret, 8); + } + break; + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 32); + assert((offset & 15) == 0); + tcg_out_mem_long(s, 0, LVX, ret & 31, base, offset); + break; + default: + g_assert_not_reached(); + } +} + +static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, + TCGReg base, intptr_t offset) +{ + int shift; + + switch (type) { + case TCG_TYPE_I32: + if (arg < 32) { + tcg_out_mem_long(s, STW, STWX, arg, base, offset); + break; + } + assert((offset & 3) == 0); + shift = (offset - 4) & 0xc; + if (shift) { + tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, shift); + arg = TCG_VEC_TMP1; + } + tcg_out_mem_long(s, 0, STVEWX, arg & 31, base, offset); + break; + case TCG_TYPE_I64: + if (arg < 32) { + tcg_out_mem_long(s, STD, STDX, arg, base, offset); + break; + } + /* fallthru */ + case TCG_TYPE_V64: + tcg_debug_assert(arg >= 32); + assert((offset & 7) == 0); + if (offset & 8) { + tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, 8); + arg = TCG_VEC_TMP1; + } + tcg_out_mem_long(s, 0, STVEWX, arg & 31, base, offset); + tcg_out_mem_long(s, 0, STVEWX, arg & 31, base, offset + 4); + break; + case TCG_TYPE_V128: + tcg_debug_assert(arg >= 32); + tcg_out_mem_long(s, 0, STVX, arg & 31, base, offset); + break; + default: + g_assert_not_reached(); } - tcg_out_mem_long(s, opi, opx, arg, arg1, arg2); } static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, @@ -2618,6 +2852,292 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args, } } +int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) +{ + switch (opc) { + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + case INDEX_op_not_vec: + return 1; + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_smax_vec: + case INDEX_op_smin_vec: + case INDEX_op_umax_vec: + case INDEX_op_umin_vec: + case INDEX_op_ssadd_vec: + case INDEX_op_sssub_vec: + case INDEX_op_usadd_vec: + case INDEX_op_ussub_vec: + return vece <= MO_32; + case INDEX_op_cmp_vec: + return vece <= MO_32 ? -1 : 0; + default: + return 0; + } +} + +static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg dst, TCGReg src) +{ + tcg_debug_assert(dst >= 32); + tcg_debug_assert(src >= 32); + + /* + * Recall we use (or emulate) VSX integer loads, so the integer is + * right justified within the left (zero-index) double-word. + */ + switch (vece) { + case MO_8: + tcg_out32(s, VSPLTB | VRT(dst) | VRB(src) | (7 << 16)); + break; + case MO_16: + tcg_out32(s, VSPLTH | VRT(dst) | VRB(src) | (3 << 16)); + break; + case MO_32: + tcg_out32(s, VSPLTW | VRT(dst) | VRB(src) | (1 << 16)); + break; + case MO_64: + tcg_out_vsldoi(s, TCG_VEC_TMP1, src, src, 8); + tcg_out_vsldoi(s, dst, TCG_VEC_TMP1, src, 8); + break; + default: + g_assert_not_reached(); + } + return true; +} + +static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg out, TCGReg base, intptr_t offset) +{ + int elt; + + tcg_debug_assert(out >= 32); + out &= 31; + switch (vece) { + case MO_8: + tcg_out_mem_long(s, 0, LVEBX, out, base, offset); + elt = extract32(offset, 0, 4); +#ifndef HOST_WORDS_BIGENDIAN + elt ^= 15; +#endif + tcg_out32(s, VSPLTB | VRT(out) | VRB(out) | (elt << 16)); + break; + case MO_16: + assert((offset & 1) == 0); + tcg_out_mem_long(s, 0, LVEHX, out, base, offset); + elt = extract32(offset, 1, 3); +#ifndef HOST_WORDS_BIGENDIAN + elt ^= 7; +#endif + tcg_out32(s, VSPLTH | VRT(out) | VRB(out) | (elt << 16)); + break; + case MO_32: + assert((offset & 3) == 0); + tcg_out_mem_long(s, 0, LVEWX, out, base, offset); + elt = extract32(offset, 2, 2); +#ifndef HOST_WORDS_BIGENDIAN + elt ^= 3; +#endif + tcg_out32(s, VSPLTW | VRT(out) | VRB(out) | (elt << 16)); + break; + case MO_64: + assert((offset & 7) == 0); + tcg_out_mem_long(s, 0, LVX, out, base, offset & -16); + tcg_out_vsldoi(s, TCG_VEC_TMP1, out, out, 8); + elt = extract32(offset, 3, 1); +#ifndef HOST_WORDS_BIGENDIAN + elt = !elt; +#endif + if (elt) { + tcg_out_vsldoi(s, out, out, TCG_VEC_TMP1, 8); + } else { + tcg_out_vsldoi(s, out, TCG_VEC_TMP1, out, 8); + } + break; + default: + g_assert_not_reached(); + } + return true; +} + +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, + unsigned vecl, unsigned vece, + const TCGArg *args, const int *const_args) +{ + static const uint32_t + add_op[4] = { VADDUBM, VADDUHM, VADDUWM, 0 }, + sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, 0 }, + eq_op[4] = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 }, + gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 }, + gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 }, + ssadd_op[4] = { VADDSBS, VADDSHS, VADDSWS, 0 }, + usadd_op[4] = { VADDUBS, VADDUHS, VADDUWS, 0 }, + sssub_op[4] = { VSUBSBS, VSUBSHS, VSUBSWS, 0 }, + ussub_op[4] = { VSUBUBS, VSUBUHS, VSUBUWS, 0 }, + umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 }, + smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 }, + umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 }, + smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 }; + + TCGType type = vecl + TCG_TYPE_V64; + TCGArg a0 = args[0], a1 = args[1], a2 = args[2]; + uint32_t insn; + + switch (opc) { + case INDEX_op_ld_vec: + tcg_out_ld(s, type, a0, a1, a2); + return; + case INDEX_op_st_vec: + tcg_out_st(s, type, a0, a1, a2); + return; + case INDEX_op_dupm_vec: + tcg_out_dupm_vec(s, type, vece, a0, a1, a2); + return; + + case INDEX_op_add_vec: + insn = add_op[vece]; + break; + case INDEX_op_sub_vec: + insn = sub_op[vece]; + break; + case INDEX_op_ssadd_vec: + insn = ssadd_op[vece]; + break; + case INDEX_op_sssub_vec: + insn = sssub_op[vece]; + break; + case INDEX_op_usadd_vec: + insn = usadd_op[vece]; + break; + case INDEX_op_ussub_vec: + insn = ussub_op[vece]; + break; + case INDEX_op_smin_vec: + insn = smin_op[vece]; + break; + case INDEX_op_umin_vec: + insn = umin_op[vece]; + break; + case INDEX_op_smax_vec: + insn = smax_op[vece]; + break; + case INDEX_op_umax_vec: + insn = umax_op[vece]; + break; + case INDEX_op_and_vec: + insn = VAND; + break; + case INDEX_op_or_vec: + insn = VOR; + break; + case INDEX_op_xor_vec: + insn = VXOR; + break; + case INDEX_op_andc_vec: + insn = VANDC; + break; + case INDEX_op_not_vec: + insn = VNOR; + a2 = a1; + break; + + case INDEX_op_cmp_vec: + switch (args[3]) { + case TCG_COND_EQ: + insn = eq_op[vece]; + break; + case TCG_COND_GT: + insn = gts_op[vece]; + break; + case TCG_COND_GTU: + insn = gtu_op[vece]; + break; + default: + g_assert_not_reached(); + } + break; + + case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov. */ + case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi. */ + case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec. */ + default: + g_assert_not_reached(); + } + + tcg_debug_assert(insn != 0); + tcg_out32(s, insn | VRT(a0) | VRA(a1) | VRB(a2)); +} + +static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0, + TCGv_vec v1, TCGv_vec v2, TCGCond cond) +{ + bool need_swap = false, need_inv = false; + + tcg_debug_assert(vece <= MO_32); + + switch (cond) { + case TCG_COND_EQ: + case TCG_COND_GT: + case TCG_COND_GTU: + break; + case TCG_COND_NE: + case TCG_COND_LE: + case TCG_COND_LEU: + need_inv = true; + break; + case TCG_COND_LT: + case TCG_COND_LTU: + need_swap = true; + break; + case TCG_COND_GE: + case TCG_COND_GEU: + need_swap = need_inv = true; + break; + default: + g_assert_not_reached(); + } + + if (need_inv) { + cond = tcg_invert_cond(cond); + } + if (need_swap) { + TCGv_vec t1; + t1 = v1, v1 = v2, v2 = t1; + cond = tcg_swap_cond(cond); + } + + vec_gen_4(INDEX_op_cmp_vec, type, vece, tcgv_vec_arg(v0), + tcgv_vec_arg(v1), tcgv_vec_arg(v2), cond); + + if (need_inv) { + tcg_gen_not_vec(vece, v0, v0); + } +} + +void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece, + TCGArg a0, ...) +{ + va_list va; + TCGv_vec v0, v1, v2; + + va_start(va, a0); + v0 = temp_tcgv_vec(arg_temp(a0)); + v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg))); + v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg))); + + switch (opc) { + case INDEX_op_cmp_vec: + expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg)); + break; + default: + g_assert_not_reached(); + } + va_end(va); +} + static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) { static const TCGTargetOpDef r = { .args_ct_str = { "r" } }; @@ -2655,6 +3175,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) = { .args_ct_str = { "r", "r", "r", "r", "rI", "rZM" } }; static const TCGTargetOpDef sub2 = { .args_ct_str = { "r", "r", "rI", "rZM", "r", "r" } }; + static const TCGTargetOpDef v_r = { .args_ct_str = { "v", "r" } }; + static const TCGTargetOpDef v_v = { .args_ct_str = { "v", "v" } }; + static const TCGTargetOpDef v_v_v = { .args_ct_str = { "v", "v", "v" } }; switch (op) { case INDEX_op_goto_ptr: @@ -2790,6 +3313,32 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) return (TCG_TARGET_REG_BITS == 64 ? &S_S : TARGET_LONG_BITS == 32 ? &S_S_S : &S_S_S_S); + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_mul_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + case INDEX_op_orc_vec: + case INDEX_op_cmp_vec: + case INDEX_op_ssadd_vec: + case INDEX_op_sssub_vec: + case INDEX_op_usadd_vec: + case INDEX_op_ussub_vec: + case INDEX_op_smax_vec: + case INDEX_op_smin_vec: + case INDEX_op_umax_vec: + case INDEX_op_umin_vec: + return &v_v_v; + case INDEX_op_not_vec: + case INDEX_op_dup_vec: + return &v_v; + case INDEX_op_ld_vec: + case INDEX_op_st_vec: + case INDEX_op_dupm_vec: + return &v_r; + default: return NULL; } @@ -2800,6 +3349,9 @@ static void tcg_target_init(TCGContext *s) unsigned long hwcap = qemu_getauxval(AT_HWCAP); unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2); + if (hwcap & PPC_FEATURE_HAS_ALTIVEC) { + have_isa_altivec = true; + } if (hwcap & PPC_FEATURE_ARCH_2_06) { have_isa_2_06 = true; } @@ -2811,6 +3363,10 @@ static void tcg_target_init(TCGContext *s) tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff; tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff; + if (have_isa_altivec) { + tcg_target_available_regs[TCG_TYPE_V64] = 0xffffffff00000000ull; + tcg_target_available_regs[TCG_TYPE_V128] = 0xffffffff00000000ull; + } tcg_target_call_clobber_regs = 0; tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R0); @@ -2826,6 +3382,27 @@ static void tcg_target_init(TCGContext *s) tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R11); tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R12); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V0); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V1); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V2); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V3); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V4); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V5); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V6); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V7); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V8); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V9); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V10); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V11); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V12); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V13); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V14); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V15); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V16); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V17); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V18); + tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V19); + s->reserved_regs = 0; tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0); /* tcg temp */ tcg_regset_set_reg(s->reserved_regs, TCG_REG_R1); /* stack pointer */ @@ -2836,6 +3413,8 @@ static void tcg_target_init(TCGContext *s) tcg_regset_set_reg(s->reserved_regs, TCG_REG_R13); /* thread pointer */ #endif tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1); /* mem temp */ + tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP1); + tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP2); if (USE_REG_TB) { tcg_regset_set_reg(s->reserved_regs, TCG_REG_TB); /* tb->tc_ptr */ }