From patchwork Fri Nov 11 07:40:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 623683 Delivered-To: patch@linaro.org Received: by 2002:a17:522:c983:b0:460:3032:e3c4 with SMTP id kr3csp616427pvb; Thu, 10 Nov 2022 23:44:16 -0800 (PST) X-Google-Smtp-Source: AA0mqf5vSiVHHdxdaaSRA8G1cmEE0a9ujsyhWjLWWgfSch+GN+bxCICl8jl1FMX5PU4LcYPFgANW X-Received: by 2002:a0c:90f1:0:b0:4bb:75b4:3ccf with SMTP id p104-20020a0c90f1000000b004bb75b43ccfmr940716qvp.104.1668152656016; Thu, 10 Nov 2022 23:44:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668152656; cv=none; d=google.com; s=arc-20160816; b=WzrF76AH3In0iYsoVDfS1tZz/mBNJzEvMEoH2lrhbkz6ImrGn8SsSCZ8BNI07n1E0P 670XShW+PRDcxeydi6CYb9BMHdjKObV8CSAS4G0cs6lcoZwGhAv9fzboS35xM+6weCua j91imv0RG8EGKx8uR97pHRM9RwBDFYJ5HIRPvQy7VPpeIYlBf+1PeT211dA/SjfEDrSA l+VFpFdTRVcD2Sf0MMdEpH3gsKsgy8cUpHQTIjMZ3dn+rGaD9C3cszXivK7wfVv0ZHsV OlNIjWnFDlMPm7gRIQ9fViMIQilWKiDwzBPdsizPCq/fPVRVY3ukV5MLI8C3Bv9ZOB3l XovA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=RfMEHKg0KKPxSd2iaR+EBnsQ5GqO2EMMWetPguBEX2g=; b=vKOfkv+1121V3mqBDJBa2gEFH1Zlt8qbnBOMAsjZvY4f2Un/gEbgsNEalIwkMdmHHp Nn8ALl2yc3R9Y3VHRBSg0NP4IWn5gfs6V49Fe9Qhc8tDdJsYlYmVsrOf7wUZUfkKb8cA NGfK/gjXvGK6Rnqs4dzUxW/9OV5gd8ete1E2hDvhiwwawNDeao5ss8t4GkgOL/EcfaJd d3EP/hwmW4i/DY0RbFEHUMvJuvjPZWXDcoOUJeHDJ5cWdSnS87KQ2tAvg+npsg1YzuIo pXFpE2AqkozXF0jJPSMEfnbdq+FK7aL+XvBfz/aPtEmBr/4Zajpp8qTCH8kGKR/f4ocq Zfqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZsyxjRBT; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id f29-20020ac86edd000000b003a573b691a4si657651qtv.494.2022.11.10.23.44.15 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 10 Nov 2022 23:44:16 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZsyxjRBT; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1otOg8-0006ng-NS; Fri, 11 Nov 2022 02:42:20 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1otOfz-0006gT-7z for qemu-devel@nongnu.org; Fri, 11 Nov 2022 02:42:14 -0500 Received: from mail-pj1-x1034.google.com ([2607:f8b0:4864:20::1034]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1otOfv-0000Eh-9w for qemu-devel@nongnu.org; Fri, 11 Nov 2022 02:42:10 -0500 Received: by mail-pj1-x1034.google.com with SMTP id l6so3813790pjj.0 for ; Thu, 10 Nov 2022 23:42:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=RfMEHKg0KKPxSd2iaR+EBnsQ5GqO2EMMWetPguBEX2g=; b=ZsyxjRBT8iXrQSXin48aM4HJ5lMgHyqRAdFAdKiRXup0Yw0J67q3d86cjrf/9hellC 0hq7AC1o8/5Mv0jCrk1R7WAVZc7qvV/rNlxU1+vf1LFRkxa+C6plMjTPErxzS8NOYQvh 3xEhPEf7QBQLdAjjWbnBMnNDSVY1sAd0M+SVdG/Iwws6GcRvdqF5X4iIMnQNrd+jyrbH ZqfF/lReuXqrgGmB4FgZIfWqc/W3NnASLLuW4hX76nKHaad3cCBQPrh/b5lMjisREhnt 99YjjormcGeOY2o2/oAYtD5cGtYTbYhjI9Qfg9rh37Z1U3GwHzXEnXX2rZ+FeUEQrJJZ XH2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RfMEHKg0KKPxSd2iaR+EBnsQ5GqO2EMMWetPguBEX2g=; b=JqSPo2tlgI7g4Yu/t8VxlFkIvWIzrZ+RQassN3GHgR6i5M3B/MN1G+uBeroNh2ht4Y 31I2KBT0j7udG6fTRdXhirlvn8Quq+V9pt/A266eZvH0DXGbFuOmnugUt5e9n7pOBQNv w0SZ479h3B5ECUD76/gnAY6nXHVvF4KPCwJjV9TLVHKnIR7Z4UNrU6rrg+U9nbi0jHsc UiCXIf33823/sH8DA2GqobPD9EbWt5FoHOLvaHgpTP7x5P0R74kTqlB5MZLW+YAi8uSy WHgin6XL732U1D5kEDYQICNLW5UYcyFa/6RWVHogM6vh6b73Pf3mZTQcX2nirh4h/8jm 1xPg== X-Gm-Message-State: ANoB5pnlMSMdkWALVB719RaEZRS8x72UGe9PGw+5lMHx1WI7arUXxJeL dKL6O5tLpqFTvWWFct7+dE2sHW3mPYxwJbsS X-Received: by 2002:a17:90a:804c:b0:20d:d531:97cc with SMTP id e12-20020a17090a804c00b0020dd53197ccmr596500pjw.164.1668152525746; Thu, 10 Nov 2022 23:42:05 -0800 (PST) Received: from stoup.lan (2001-44b8-2176-c800-0aef-59aa-1faf-7e61.static.ipv6.internode.on.net. [2001:44b8:2176:c800:aef:59aa:1faf:7e61]) by smtp.gmail.com with ESMTPSA id o15-20020a170902d4cf00b001754fa42065sm976069plg.143.2022.11.10.23.42.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Nov 2022 23:42:05 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH for-8.0 v3 25/45] tcg: Reorg function calls Date: Fri, 11 Nov 2022 17:40:41 +1000 Message-Id: <20221111074101.2069454-26-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221111074101.2069454-1-richard.henderson@linaro.org> References: <20221111074101.2069454-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1034; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1034.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Pre-compute the function call layout for each helper at startup. Drop TCG_CALL_DUMMY_ARG, as we no longer need to leave gaps in the op->args[] array. This allows several places to stop checking for NULL TCGTemp, to which TCG_CALL_DUMMY_ARG mapped. For tcg_gen_callN, loop over the arguments once. Allocate the TCGOp for the call early but delay emitting it, collecting arguments first. This allows the argument processing loop to emit code for extensions and have them sequenced before the call. For tcg_reg_alloc_call, loop over the arguments in reverse order, which allows stack slots to be filled first naturally. Signed-off-by: Richard Henderson --- include/exec/helper-head.h | 2 + include/tcg/tcg.h | 5 +- tcg/tcg-internal.h | 22 +- tcg/optimize.c | 6 +- tcg/tcg.c | 601 ++++++++++++++++++++++--------------- 5 files changed, 387 insertions(+), 249 deletions(-) diff --git a/include/exec/helper-head.h b/include/exec/helper-head.h index 8bdf0f6ea2..bc6698b19f 100644 --- a/include/exec/helper-head.h +++ b/include/exec/helper-head.h @@ -133,4 +133,6 @@ #define DEF_HELPER_7(name, ret, t1, t2, t3, t4, t5, t6, t7) \ DEF_HELPER_FLAGS_7(name, 0, ret, t1, t2, t3, t4, t5, t6, t7) +/* MAX_CALL_IARGS must be set to n if last entry is DEF_HELPER_FLAGS_n. */ + #endif /* EXEC_HELPER_HEAD_H */ diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h index a6310b898f..b949d75fdd 100644 --- a/include/tcg/tcg.h +++ b/include/tcg/tcg.h @@ -38,6 +38,8 @@ /* XXX: make safe guess about sizes */ #define MAX_OP_PER_INSTR 266 +#define MAX_CALL_IARGS 7 + #define CPU_TEMP_BUF_NLONGS 128 #define TCG_STATIC_FRAME_SIZE (CPU_TEMP_BUF_NLONGS * sizeof(long)) @@ -411,9 +413,6 @@ typedef TCGv_ptr TCGv_env; #define TCG_CALL_NO_RWG_SE (TCG_CALL_NO_RWG | TCG_CALL_NO_SE) #define TCG_CALL_NO_WG_SE (TCG_CALL_NO_WG | TCG_CALL_NO_SE) -/* Used to align parameters. See the comment before tcgv_i32_temp. */ -#define TCG_CALL_DUMMY_ARG ((TCGArg)0) - /* * Flags for the bswap opcodes. * If IZ, the input is zero-extended, otherwise unknown. diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h index f574743ff8..c7e87e193d 100644 --- a/tcg/tcg-internal.h +++ b/tcg/tcg-internal.h @@ -42,11 +42,29 @@ typedef enum { TCG_CALL_ARG_EXTEND_S, /* ... as a sign-extended i64 */ } TCGCallArgumentKind; +typedef struct TCGCallArgumentLoc { + TCGCallArgumentKind kind : 8; + unsigned arg_slot : 8; + unsigned ref_slot : 8; + unsigned arg_idx : 4; + unsigned tmp_subindex : 2; +} TCGCallArgumentLoc; + +/* Avoid "unsigned < 0 is always false" Werror, when iarg_regs is empty. */ +#define REG_P(L) \ + ((int)(L)->arg_slot < (int)ARRAY_SIZE(tcg_target_call_iarg_regs)) + typedef struct TCGHelperInfo { void *func; const char *name; - unsigned flags; - unsigned typemask; + unsigned typemask : 32; + unsigned flags : 8; + unsigned nr_in : 8; + unsigned nr_out : 8; + TCGCallReturnKind out_kind : 8; + + /* Maximum physical arguments are constrained by TCG_TYPE_I128. */ + TCGCallArgumentLoc in[MAX_CALL_IARGS * (128 / TCG_TARGET_REG_BITS)]; } TCGHelperInfo; extern TCGContext tcg_init_ctx; diff --git a/tcg/optimize.c b/tcg/optimize.c index 1afd50175b..763bca9ea6 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -667,9 +667,7 @@ static void init_arguments(OptContext *ctx, TCGOp *op, int nb_args) { for (int i = 0; i < nb_args; i++) { TCGTemp *ts = arg_temp(op->args[i]); - if (ts) { - init_ts_info(ctx, ts); - } + init_ts_info(ctx, ts); } } @@ -680,7 +678,7 @@ static void copy_propagate(OptContext *ctx, TCGOp *op, for (int i = nb_oargs; i < nb_oargs + nb_iargs; i++) { TCGTemp *ts = arg_temp(op->args[i]); - if (ts && ts_is_copy(ts)) { + if (ts_is_copy(ts)) { op->args[i] = temp_arg(find_better_copy(s, ts)); } } diff --git a/tcg/tcg.c b/tcg/tcg.c index 65be472fef..cabc397a38 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -546,7 +546,7 @@ void tcg_pool_reset(TCGContext *s) #include "exec/helper-proto.h" -static const TCGHelperInfo all_helpers[] = { +static TCGHelperInfo all_helpers[] = { #include "exec/helper-tcg.h" }; static GHashTable *helper_table; @@ -564,6 +564,140 @@ static ffi_type * const typecode_to_ffi[8] = { }; #endif +typedef struct TCGCumulativeArgs { + int arg_idx; /* tcg_gen_callN args[] */ + int info_in_idx; /* TCGHelperInfo in[] */ + int arg_slot; /* regs+stack slot */ + int ref_slot; /* stack slots for references */ +} TCGCumulativeArgs; + +static void layout_arg_even(TCGCumulativeArgs *cum) +{ + cum->arg_slot += cum->arg_slot & 1; +} + +static void layout_arg_1(TCGCumulativeArgs *cum, TCGHelperInfo *info, + TCGCallArgumentKind kind) +{ + TCGCallArgumentLoc *loc = &info->in[cum->info_in_idx]; + + *loc = (TCGCallArgumentLoc){ + .kind = kind, + .arg_idx = cum->arg_idx, + .arg_slot = cum->arg_slot, + }; + cum->info_in_idx++; + cum->arg_slot++; +} + +static void layout_arg_normal_n(TCGCumulativeArgs *cum, + TCGHelperInfo *info, int n) +{ + TCGCallArgumentLoc *loc = &info->in[cum->info_in_idx]; + + for (int i = 0; i < n; ++i) { + /* Layout all using the same arg_idx, adjusting the subindex. */ + loc[i] = (TCGCallArgumentLoc){ + .kind = TCG_CALL_ARG_NORMAL, + .arg_idx = cum->arg_idx, + .tmp_subindex = i, + .arg_slot = cum->arg_slot + i, + }; + } + cum->info_in_idx += n; + cum->arg_slot += n; +} + +static void init_call_layout(TCGHelperInfo *info) +{ + int max_reg_slots = ARRAY_SIZE(tcg_target_call_iarg_regs); + int max_stk_slots = TCG_STATIC_CALL_ARGS_SIZE / sizeof(tcg_target_long); + unsigned typemask = info->typemask; + unsigned typecode; + TCGCumulativeArgs cum = { }; + + /* + * Parse and place any function return value. + */ + typecode = typemask & 7; + switch (typecode) { + case dh_typecode_void: + info->nr_out = 0; + break; + case dh_typecode_i32: + case dh_typecode_s32: + case dh_typecode_ptr: + info->nr_out = 1; + info->out_kind = TCG_CALL_RET_NORMAL; + break; + case dh_typecode_i64: + case dh_typecode_s64: + info->nr_out = 64 / TCG_TARGET_REG_BITS; + info->out_kind = TCG_CALL_RET_NORMAL; + break; + default: + g_assert_not_reached(); + } + assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs)); + + /* + * Parse and place function arguments. + */ + for (typemask >>= 3; typemask; typemask >>= 3, cum.arg_idx++) { + TCGCallArgumentKind kind; + + typecode = typemask & 7; + switch (typecode) { + case dh_typecode_i32: + case dh_typecode_s32: + switch (TCG_TARGET_CALL_ARG_I32) { + case TCG_CALL_ARG_EVEN: + layout_arg_even(&cum); + /* fall through */ + case TCG_CALL_ARG_NORMAL: + layout_arg_1(&cum, info, TCG_CALL_ARG_NORMAL); + break; + case TCG_CALL_ARG_EXTEND: + kind = TCG_CALL_ARG_EXTEND_U + (typecode & 1); + layout_arg_1(&cum, info, kind); + break; + default: + qemu_build_not_reached(); + } + break; + case dh_typecode_i64: + case dh_typecode_s64: + switch (TCG_TARGET_CALL_ARG_I64) { + case TCG_CALL_ARG_EVEN: + layout_arg_even(&cum); + /* fall through */ + case TCG_CALL_ARG_NORMAL: + if (TCG_TARGET_REG_BITS == 32) { + layout_arg_normal_n(&cum, info, 2); + } else { + layout_arg_1(&cum, info, TCG_CALL_ARG_NORMAL); + } + break; + default: + qemu_build_not_reached(); + } + break; + case dh_typecode_ptr: + layout_arg_1(&cum, info, TCG_CALL_ARG_NORMAL); + break; + default: + g_assert_not_reached(); + } + } + info->nr_in = cum.info_in_idx; + + /* Validate that we didn't overrun the input array. */ + assert(cum.info_in_idx <= ARRAY_SIZE(info->in)); + /* Validate the backend has enough argument space. */ + assert(cum.arg_slot <= max_reg_slots + max_stk_slots); + assert(cum.ref_slot <= max_stk_slots); +} + static int indirect_reg_alloc_order[ARRAY_SIZE(tcg_target_reg_alloc_order)]; static void process_op_defs(TCGContext *s); static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type, @@ -603,6 +737,7 @@ static void tcg_context_init(unsigned max_cpus) helper_table = g_hash_table_new(NULL, NULL); for (i = 0; i < ARRAY_SIZE(all_helpers); ++i) { + init_call_layout(&all_helpers[i]); g_hash_table_insert(helper_table, (gpointer)all_helpers[i].func, (gpointer)&all_helpers[i]); } @@ -1473,18 +1608,19 @@ bool tcg_op_supported(TCGOpcode op) } } -/* Note: we convert the 64 bit args to 32 bit and do some alignment - and endian swap. Maybe it would be better to do the alignment - and endian swap in tcg_reg_alloc_call(). */ +static TCGOp *tcg_op_alloc(TCGOpcode opc, unsigned nargs); + void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, TCGTemp **args) { - int i, real_args, nb_rets, pi, max_args; - unsigned typemask; const TCGHelperInfo *info; + TCGv_i64 extend_free[MAX_CALL_IARGS]; + int n_extend = 0; TCGOp *op; + int i, n, pi = 0, total_args; info = g_hash_table_lookup(helper_table, (gpointer)func); - typemask = info->typemask; + total_args = info->nr_out + info->nr_in + 2; + op = tcg_op_alloc(INDEX_op_call, total_args); #ifdef CONFIG_PLUGIN /* detect non-plugin helpers */ @@ -1493,111 +1629,65 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, TCGTemp **args) } #endif - if (TCG_TARGET_CALL_ARG_I32 == TCG_CALL_ARG_EXTEND) { - for (i = 0; i < nargs; ++i) { - int argtype = extract32(typemask, (i + 1) * 3, 3); - bool is_32bit = (argtype & ~1) == dh_typecode_i32; - bool is_signed = argtype & 1; + TCGOP_CALLO(op) = n = info->nr_out; + switch (n) { + case 0: + tcg_debug_assert(ret == NULL); + break; + case 1: + tcg_debug_assert(ret != NULL); + op->args[pi++] = temp_arg(ret); + break; + case 2: + tcg_debug_assert(ret != NULL); + tcg_debug_assert(ret->base_type == ret->type + 1); + tcg_debug_assert(ret->temp_subindex == 0); + op->args[pi++] = temp_arg(ret); + op->args[pi++] = temp_arg(ret + 1); + break; + default: + g_assert_not_reached(); + } - if (is_32bit) { + TCGOP_CALLI(op) = n = info->nr_in; + for (i = 0; i < n; i++) { + const TCGCallArgumentLoc *loc = &info->in[i]; + TCGTemp *ts = args[loc->arg_idx] + loc->tmp_subindex; + + switch (loc->kind) { + case TCG_CALL_ARG_NORMAL: + op->args[pi++] = temp_arg(ts); + break; + + case TCG_CALL_ARG_EXTEND_U: + case TCG_CALL_ARG_EXTEND_S: + { TCGv_i64 temp = tcg_temp_new_i64(); - TCGv_i32 orig = temp_tcgv_i32(args[i]); - if (is_signed) { + TCGv_i32 orig = temp_tcgv_i32(ts); + + if (loc->kind == TCG_CALL_ARG_EXTEND_S) { tcg_gen_ext_i32_i64(temp, orig); } else { tcg_gen_extu_i32_i64(temp, orig); } - args[i] = tcgv_i64_temp(temp); + op->args[pi++] = tcgv_i64_arg(temp); + extend_free[n_extend++] = temp; } - } - } - - /* - * A Call op needs up to 4 + 2N parameters on 32-bit archs, - * and up to 4 + N parameters on 64-bit archs - * (N = number of input arguments + output arguments). - */ - max_args = (64 / TCG_TARGET_REG_BITS) * nargs + 4; - op = tcg_emit_op(INDEX_op_call, max_args); - - pi = 0; - if (ret != NULL) { - if (TCG_TARGET_REG_BITS < 64 && (typemask & 6) == dh_typecode_i64) { - op->args[pi++] = temp_arg(ret); - op->args[pi++] = temp_arg(ret + 1); - nb_rets = 2; - } else { - op->args[pi++] = temp_arg(ret); - nb_rets = 1; - } - } else { - nb_rets = 0; - } - TCGOP_CALLO(op) = nb_rets; - - real_args = 0; - for (i = 0; i < nargs; i++) { - int argtype = extract32(typemask, (i + 1) * 3, 3); - TCGCallArgumentKind kind; - TCGType type; - - switch (argtype) { - case dh_typecode_i32: - case dh_typecode_s32: - type = TCG_TYPE_I32; - kind = TCG_TARGET_CALL_ARG_I32; break; - case dh_typecode_i64: - case dh_typecode_s64: - type = TCG_TYPE_I64; - kind = TCG_TARGET_CALL_ARG_I64; - break; - case dh_typecode_ptr: - type = TCG_TYPE_PTR; - kind = TCG_CALL_ARG_NORMAL; - break; - default: - g_assert_not_reached(); - } - switch (kind) { - case TCG_CALL_ARG_EVEN: - if (real_args & 1) { - op->args[pi++] = TCG_CALL_DUMMY_ARG; - real_args++; - } - /* fall through */ - case TCG_CALL_ARG_NORMAL: - if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) { - op->args[pi++] = temp_arg(args[i]); - op->args[pi++] = temp_arg(args[i] + 1); - real_args += 2; - break; - } - op->args[pi++] = temp_arg(args[i]); - real_args++; - break; default: g_assert_not_reached(); } } op->args[pi++] = (uintptr_t)func; op->args[pi++] = (uintptr_t)info; - TCGOP_CALLI(op) = real_args; + tcg_debug_assert(pi == total_args); - /* Make sure the fields didn't overflow. */ - tcg_debug_assert(TCGOP_CALLI(op) == real_args); - tcg_debug_assert(pi <= max_args); + QTAILQ_INSERT_TAIL(&tcg_ctx->ops, op, link); - if (TCG_TARGET_CALL_ARG_I32 == TCG_CALL_ARG_EXTEND) { - for (i = 0; i < nargs; ++i) { - int argtype = extract32(typemask, (i + 1) * 3, 3); - bool is_32bit = (argtype & ~1) == dh_typecode_i32; - - if (is_32bit) { - tcg_temp_free_internal(args[i]); - } - } + tcg_debug_assert(n_extend < ARRAY_SIZE(extend_free)); + for (i = 0; i < n_extend; ++i) { + tcg_temp_free_i64(extend_free[i]); } } @@ -1813,10 +1903,7 @@ static void tcg_dump_ops(TCGContext *s, FILE *f, bool have_prefs) } for (i = 0; i < nb_iargs; i++) { TCGArg arg = op->args[nb_oargs + i]; - const char *t = ""; - if (arg != TCG_CALL_DUMMY_ARG) { - t = tcg_get_arg_str(s, buf, sizeof(buf), arg); - } + const char *t = tcg_get_arg_str(s, buf, sizeof(buf), arg); col += ne_fprintf(f, ",%s", t); } } else { @@ -2597,12 +2684,11 @@ static void liveness_pass_1(TCGContext *s) switch (opc) { case INDEX_op_call: { - int call_flags; - int nb_call_regs; + const TCGHelperInfo *info = tcg_call_info(op); + int call_flags = tcg_call_flags(op); nb_oargs = TCGOP_CALLO(op); nb_iargs = TCGOP_CALLI(op); - call_flags = tcg_call_flags(op); /* pure functions can be removed if their result is unused */ if (call_flags & TCG_CALL_NO_SIDE_EFFECTS) { @@ -2642,7 +2728,7 @@ static void liveness_pass_1(TCGContext *s) /* Record arguments that die in this helper. */ for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) { ts = arg_temp(op->args[i]); - if (ts && ts->state & TS_DEAD) { + if (ts->state & TS_DEAD) { arg_life |= DEAD_ARG << i; } } @@ -2650,31 +2736,59 @@ static void liveness_pass_1(TCGContext *s) /* For all live registers, remove call-clobbered prefs. */ la_cross_call(s, nb_temps); - nb_call_regs = ARRAY_SIZE(tcg_target_call_iarg_regs); + /* + * Input arguments are live for preceding opcodes. + * + * For those arguments that die, and will be allocated in + * registers, clear the register set for that arg, to be + * filled in below. For args that will be on the stack, + * reset to any available reg. Process arguments in reverse + * order so that if a temp is used more than once, the stack + * reset to max happens before the register reset to 0. + */ + for (i = nb_iargs - 1; i >= 0; i--) { + const TCGCallArgumentLoc *loc = &info->in[i]; + ts = arg_temp(op->args[nb_oargs + i]); - /* Input arguments are live for preceding opcodes. */ - for (i = 0; i < nb_iargs; i++) { - ts = arg_temp(op->args[i + nb_oargs]); - if (ts && ts->state & TS_DEAD) { - /* For those arguments that die, and will be allocated - * in registers, clear the register set for that arg, - * to be filled in below. For args that will be on - * the stack, reset to any available reg. - */ - *la_temp_pref(ts) - = (i < nb_call_regs ? 0 : - tcg_target_available_regs[ts->type]); + if (ts->state & TS_DEAD) { + switch (loc->kind) { + case TCG_CALL_ARG_NORMAL: + case TCG_CALL_ARG_EXTEND_U: + case TCG_CALL_ARG_EXTEND_S: + if (REG_P(loc)) { + *la_temp_pref(ts) = 0; + break; + } + /* fall through */ + default: + *la_temp_pref(ts) = + tcg_target_available_regs[ts->type]; + break; + } ts->state &= ~TS_DEAD; } } - /* For each input argument, add its input register to prefs. - If a temp is used once, this produces a single set bit. */ - for (i = 0; i < MIN(nb_call_regs, nb_iargs); i++) { - ts = arg_temp(op->args[i + nb_oargs]); - if (ts) { - tcg_regset_set_reg(*la_temp_pref(ts), - tcg_target_call_iarg_regs[i]); + /* + * For each input argument, add its input register to prefs. + * If a temp is used once, this produces a single set bit; + * if a temp is used multiple times, this produces a set. + */ + for (i = 0; i < nb_iargs; i++) { + const TCGCallArgumentLoc *loc = &info->in[i]; + ts = arg_temp(op->args[nb_oargs + i]); + + switch (loc->kind) { + case TCG_CALL_ARG_NORMAL: + case TCG_CALL_ARG_EXTEND_U: + case TCG_CALL_ARG_EXTEND_S: + if (REG_P(loc)) { + tcg_regset_set_reg(*la_temp_pref(ts), + tcg_target_call_iarg_regs[loc->arg_slot]); + } + break; + default: + break; } } } @@ -2945,21 +3059,19 @@ static bool liveness_pass_2(TCGContext *s) /* Make sure that input arguments are available. */ for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) { arg_ts = arg_temp(op->args[i]); - if (arg_ts) { - dir_ts = arg_ts->state_ptr; - if (dir_ts && arg_ts->state == TS_DEAD) { - TCGOpcode lopc = (arg_ts->type == TCG_TYPE_I32 - ? INDEX_op_ld_i32 - : INDEX_op_ld_i64); - TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3); + dir_ts = arg_ts->state_ptr; + if (dir_ts && arg_ts->state == TS_DEAD) { + TCGOpcode lopc = (arg_ts->type == TCG_TYPE_I32 + ? INDEX_op_ld_i32 + : INDEX_op_ld_i64); + TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3); - lop->args[0] = temp_arg(dir_ts); - lop->args[1] = temp_arg(arg_ts->mem_base); - lop->args[2] = arg_ts->mem_offset; + lop->args[0] = temp_arg(dir_ts); + lop->args[1] = temp_arg(arg_ts->mem_base); + lop->args[2] = arg_ts->mem_offset; - /* Loaded, but synced with memory. */ - arg_ts->state = TS_MEM; - } + /* Loaded, but synced with memory. */ + arg_ts->state = TS_MEM; } } @@ -2968,14 +3080,12 @@ static bool liveness_pass_2(TCGContext *s) so that we reload when needed. */ for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) { arg_ts = arg_temp(op->args[i]); - if (arg_ts) { - dir_ts = arg_ts->state_ptr; - if (dir_ts) { - op->args[i] = temp_arg(dir_ts); - changes = true; - if (IS_DEAD_ARG(i)) { - arg_ts->state = TS_DEAD; - } + dir_ts = arg_ts->state_ptr; + if (dir_ts) { + op->args[i] = temp_arg(dir_ts); + changes = true; + if (IS_DEAD_ARG(i)) { + arg_ts->state = TS_DEAD; } } } @@ -4181,106 +4291,107 @@ static bool tcg_reg_alloc_dup2(TCGContext *s, const TCGOp *op) return true; } +static void load_arg_reg(TCGContext *s, TCGReg reg, TCGTemp *ts, + TCGRegSet allocated_regs) +{ + if (ts->val_type == TEMP_VAL_REG) { + if (ts->reg != reg) { + tcg_reg_free(s, reg, allocated_regs); + if (!tcg_out_mov(s, ts->type, reg, ts->reg)) { + /* + * Cross register class move not supported. Sync the + * temp back to its slot and load from there. + */ + temp_sync(s, ts, allocated_regs, 0, 0); + tcg_out_ld(s, ts->type, reg, + ts->mem_base->reg, ts->mem_offset); + } + } + } else { + TCGRegSet arg_set = 0; + + tcg_reg_free(s, reg, allocated_regs); + tcg_regset_set_reg(arg_set, reg); + temp_load(s, ts, arg_set, allocated_regs, 0); + } +} + +static void load_arg_stk(TCGContext *s, int stk_slot, TCGTemp *ts, + TCGRegSet allocated_regs) +{ + /* + * When the destination is on the stack, load up the temp and store. + * If there are many call-saved registers, the temp might live to + * see another use; otherwise it'll be discarded. + */ + temp_load(s, ts, tcg_target_available_regs[ts->type], allocated_regs, 0); + tcg_out_st(s, ts->type, ts->reg, TCG_REG_CALL_STACK, + TCG_TARGET_CALL_STACK_OFFSET + + stk_slot * sizeof(tcg_target_long)); +} + +static void load_arg_normal(TCGContext *s, const TCGCallArgumentLoc *l, + TCGTemp *ts, TCGRegSet *allocated_regs) +{ + if (REG_P(l)) { + TCGReg reg = tcg_target_call_iarg_regs[l->arg_slot]; + load_arg_reg(s, reg, ts, *allocated_regs); + tcg_regset_set_reg(*allocated_regs, reg); + } else { + load_arg_stk(s, l->arg_slot - ARRAY_SIZE(tcg_target_call_iarg_regs), + ts, *allocated_regs); + } +} + static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op) { const int nb_oargs = TCGOP_CALLO(op); const int nb_iargs = TCGOP_CALLI(op); const TCGLifeData arg_life = op->life; - const TCGHelperInfo *info; - int flags, nb_regs, i; - TCGReg reg; - TCGArg arg; - TCGTemp *ts; - intptr_t stack_offset; - size_t call_stack_size; - tcg_insn_unit *func_addr; - int allocate_args; - TCGRegSet allocated_regs; + const TCGHelperInfo *info = tcg_call_info(op); + TCGRegSet allocated_regs = s->reserved_regs; + int i; - func_addr = tcg_call_func(op); - info = tcg_call_info(op); - flags = info->flags; + /* + * Move inputs into place in reverse order, + * so that we place stacked arguments first. + */ + for (i = nb_iargs - 1; i >= 0; --i) { + const TCGCallArgumentLoc *loc = &info->in[i]; + TCGTemp *ts = arg_temp(op->args[nb_oargs + i]); - nb_regs = ARRAY_SIZE(tcg_target_call_iarg_regs); - if (nb_regs > nb_iargs) { - nb_regs = nb_iargs; - } - - /* assign stack slots first */ - call_stack_size = (nb_iargs - nb_regs) * sizeof(tcg_target_long); - call_stack_size = (call_stack_size + TCG_TARGET_STACK_ALIGN - 1) & - ~(TCG_TARGET_STACK_ALIGN - 1); - allocate_args = (call_stack_size > TCG_STATIC_CALL_ARGS_SIZE); - if (allocate_args) { - /* XXX: if more than TCG_STATIC_CALL_ARGS_SIZE is needed, - preallocate call stack */ - tcg_abort(); - } - - stack_offset = TCG_TARGET_CALL_STACK_OFFSET; - for (i = nb_regs; i < nb_iargs; i++) { - arg = op->args[nb_oargs + i]; - if (arg != TCG_CALL_DUMMY_ARG) { - ts = arg_temp(arg); - temp_load(s, ts, tcg_target_available_regs[ts->type], - s->reserved_regs, 0); - tcg_out_st(s, ts->type, ts->reg, TCG_REG_CALL_STACK, stack_offset); - } - stack_offset += sizeof(tcg_target_long); - } - - /* assign input registers */ - allocated_regs = s->reserved_regs; - for (i = 0; i < nb_regs; i++) { - arg = op->args[nb_oargs + i]; - if (arg != TCG_CALL_DUMMY_ARG) { - ts = arg_temp(arg); - reg = tcg_target_call_iarg_regs[i]; - - if (ts->val_type == TEMP_VAL_REG) { - if (ts->reg != reg) { - tcg_reg_free(s, reg, allocated_regs); - if (!tcg_out_mov(s, ts->type, reg, ts->reg)) { - /* - * Cross register class move not supported. Sync the - * temp back to its slot and load from there. - */ - temp_sync(s, ts, allocated_regs, 0, 0); - tcg_out_ld(s, ts->type, reg, - ts->mem_base->reg, ts->mem_offset); - } - } - } else { - TCGRegSet arg_set = 0; - - tcg_reg_free(s, reg, allocated_regs); - tcg_regset_set_reg(arg_set, reg); - temp_load(s, ts, arg_set, allocated_regs, 0); - } - - tcg_regset_set_reg(allocated_regs, reg); + switch (loc->kind) { + case TCG_CALL_ARG_NORMAL: + case TCG_CALL_ARG_EXTEND_U: + case TCG_CALL_ARG_EXTEND_S: + load_arg_normal(s, loc, ts, &allocated_regs); + break; + default: + g_assert_not_reached(); } } - - /* mark dead temporaries and free the associated registers */ + + /* Mark dead temporaries and free the associated registers. */ for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) { if (IS_DEAD_ARG(i)) { temp_dead(s, arg_temp(op->args[i])); } } - - /* clobber call registers */ + + /* Clobber call registers. */ for (i = 0; i < TCG_TARGET_NB_REGS; i++) { if (tcg_regset_test_reg(tcg_target_call_clobber_regs, i)) { tcg_reg_free(s, i, allocated_regs); } } - /* Save globals if they might be written by the helper, sync them if - they might be read. */ - if (flags & TCG_CALL_NO_READ_GLOBALS) { + /* + * Save globals if they might be written by the helper, + * sync them if they might be read. + */ + if (info->flags & TCG_CALL_NO_READ_GLOBALS) { /* Nothing to do */ - } else if (flags & TCG_CALL_NO_WRITE_GLOBALS) { + } else if (info->flags & TCG_CALL_NO_WRITE_GLOBALS) { sync_globals(s, allocated_regs); } else { save_globals(s, allocated_regs); @@ -4291,31 +4402,41 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op) gpointer hash = (gpointer)(uintptr_t)info->typemask; ffi_cif *cif = g_hash_table_lookup(ffi_table, hash); assert(cif != NULL); - tcg_out_call(s, func_addr, cif); + tcg_out_call(s, tcg_call_func(op), cif); } #else - tcg_out_call(s, func_addr); + tcg_out_call(s, tcg_call_func(op)); #endif - /* assign output registers and emit moves if needed */ - for(i = 0; i < nb_oargs; i++) { - arg = op->args[i]; - ts = arg_temp(arg); + /* Assign output registers and emit moves if needed. */ + switch (info->out_kind) { + case TCG_CALL_RET_NORMAL: + for (i = 0; i < nb_oargs; i++) { + TCGTemp *ts = arg_temp(op->args[i]); + TCGReg reg = tcg_target_call_oarg_regs[i]; - /* ENV should not be modified. */ - tcg_debug_assert(!temp_readonly(ts)); + /* ENV should not be modified. */ + tcg_debug_assert(!temp_readonly(ts)); - reg = tcg_target_call_oarg_regs[i]; - tcg_debug_assert(s->reg_to_temp[reg] == NULL); - if (ts->val_type == TEMP_VAL_REG) { - s->reg_to_temp[ts->reg] = NULL; + tcg_debug_assert(s->reg_to_temp[reg] == NULL); + if (ts->val_type == TEMP_VAL_REG) { + s->reg_to_temp[ts->reg] = NULL; + } + ts->val_type = TEMP_VAL_REG; + ts->reg = reg; + ts->mem_coherent = 0; + s->reg_to_temp[reg] = ts; } - ts->val_type = TEMP_VAL_REG; - ts->reg = reg; - ts->mem_coherent = 0; - s->reg_to_temp[reg] = ts; + break; + default: + g_assert_not_reached(); + } + + /* Flush or discard output registers as needed. */ + for (i = 0; i < nb_oargs; i++) { + TCGTemp *ts = arg_temp(op->args[i]); if (NEED_SYNC_ARG(i)) { - temp_sync(s, ts, allocated_regs, 0, IS_DEAD_ARG(i)); + temp_sync(s, ts, s->reserved_regs, 0, IS_DEAD_ARG(i)); } else if (IS_DEAD_ARG(i)) { temp_dead(s, ts); }