From patchwork Tue Oct 10 00:55:55 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 115320 Delivered-To: patch@linaro.org Received: by 10.140.22.163 with SMTP id 32csp3173064qgn; Mon, 9 Oct 2017 18:06:13 -0700 (PDT) X-Received: by 10.233.221.133 with SMTP id r127mr5674554qkf.205.1507597573610; Mon, 09 Oct 2017 18:06:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507597573; cv=none; d=google.com; s=arc-20160816; b=E78QSajtioYVoxYTw0FoABzyHOgHbWI3zOK//25wsRvj8umFNlhMIARTaG8AMipHTC 6L9BjxctkSkicIr+xNe7XUzqZCMWRcKi/S3O3EUJntwkXvRjItUwsBJ8nuhYv3dEIXXO +rD7q2R/CQZdFER+1C418awRzFCRRshAx2ZilGYu5aNuKTAISWsb6EeIJ3Nx0YEMiV8r X75R475jHtLpLhUyrqkxvpCCNgjgb7tmL6UmDwTE6nbWYa7HP+/4K9jM5fp1bBxX/RFw AaKIAtVzFgvc5CkRDteJGwIdtaraYFjCoBpaFso8yErto578Wcf8xnTVdCJl0W8puMyh ulxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=ZVhJYyvotqWfO3C5NrogLABsiBUP9yTOkLR3rrD+pnI=; b=Z5EctCbC+GoGO2O6OCM5AUz1HZ670wmbcOOAtEK+A5c1c//UA8cuVZhR4SsUlLNFRS R4grdBvuxUlw+roJyHgtszZhlf3uYeVXCo5OecnduTIjmPcwWi4/tSnBosIdeUypm+q/ 4BRBx+t6Wi71MaA3Q9aH/libzSRMp16q9/XTUirlKlzM5rbqlDUPqad4N+EQDV0ePyPZ UNpdsNnXzl+NPX2XJAKYqofK4aWVcwXOZPOQ3EeNC5aq4kEVp+WisGTVC2Y/GU/+8Qar FH1OxltVMRnt6O0IyfeHCwcK/MCgDMYqFNxXpqNw3k34gXq/+QmsBhX4HF8+K9DjHHZ7 fOYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=V2rKMiNr; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id q49si2669379qtf.25.2017.10.09.18.06.12 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 09 Oct 2017 18:06:13 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=V2rKMiNr; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:60524 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e1izr-0001PZ-5h for patch@linaro.org; Mon, 09 Oct 2017 21:06:11 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60613) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e1iqV-0002UP-Pt for qemu-devel@nongnu.org; Mon, 09 Oct 2017 20:56:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e1iqS-0004ZP-LS for qemu-devel@nongnu.org; Mon, 09 Oct 2017 20:56:31 -0400 Received: from mail-pf0-x234.google.com ([2607:f8b0:400e:c00::234]:43663) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1e1iqS-0004Yy-7n for qemu-devel@nongnu.org; Mon, 09 Oct 2017 20:56:28 -0400 Received: by mail-pf0-x234.google.com with SMTP id d2so12283331pfh.0 for ; Mon, 09 Oct 2017 17:56:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ZVhJYyvotqWfO3C5NrogLABsiBUP9yTOkLR3rrD+pnI=; b=V2rKMiNr//EJV0tbhIg/RgUstnFQJnZbgbZBW8V9S70xZXBuKvw4K0iMZoXPBzQcZq izxYgTX9cnDCSarB+fHlSXWywYSRbHQv//00IKc6XDdHWVZ5tF0GZeJ9jFXNjY7fRHa3 Xm976VIwIvIRs2Fq9MPqcPgx+hgKDCZx6YZ9A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ZVhJYyvotqWfO3C5NrogLABsiBUP9yTOkLR3rrD+pnI=; b=TdUkz5imwxarGxOKGSsGJVKCAS9CIfTS5MDj5gR5hwmFWXS9fxPiIMtQ6w0CPvkTIE IMKXxaN7L3pXmWdl3ZwlE2kQ5g+P2k9p9UF623/gl4O/Cg1zpM9TtGcD4Aj6i7RFdbhT q+6d38wwomA7jJDw8aOnDSGn71B+ExpDLedZ2Cuddqnomuib/NOOvqHGSPwsvZXMUh2F 1aLZVDrLYGSQSAO9ubuVhEuJRxGeEmoRWZKi/wXvE+6b1Qc/T4UXItDsK84qhrV0vZNj QF7S5+p7pIgapyx3yaKBmm4NYLux3QQ6Kgks94tJ6W7UaRZ50boKPO+Um904xSzqHJic uEEg== X-Gm-Message-State: AMCzsaWk1DZ3RSIm4VtYYamVQj9mEValx8/dX3HJun3Jl6aCrOUkExBR V/qq89Py/wb3wS08aYlPtFLgyFuvpzo= X-Google-Smtp-Source: AOwi7QDSAU68N622kudHShJCZnXxxjat52y03z+F3aLoPT/KnLi6BtWCzyzZ3/9JP5XhPc2bSrBBGg== X-Received: by 10.101.77.137 with SMTP id p9mr10767568pgq.88.1507596986455; Mon, 09 Oct 2017 17:56:26 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-104-76.tukw.qwest.net. [97.126.104.76]) by smtp.gmail.com with ESMTPSA id n19sm17121368pfj.52.2017.10.09.17.56.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 09 Oct 2017 17:56:25 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 9 Oct 2017 17:55:55 -0700 Message-Id: <20171010005600.28735-19-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20171010005600.28735-1-richard.henderson@linaro.org> References: <20171010005600.28735-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::234 Subject: [Qemu-devel] [PULL 18/23] tcg: allocate optimizer temps with tcg_malloc X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, "Emilio G. Cota" Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" From: "Emilio G. Cota" Groundwork for supporting multiple TCG contexts. While at it, also allocate temps_used directly as a bitmap of the required size, instead of using a bitmap of TCG_MAX_TEMPS via TCGTempSet. Performance-wise we lose about 1.12% in a translation-heavy workload such as booting+shutting down debian-arm: Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): exec time (s) Relative slowdown wrt original (%) --------------------------------------------------------------- original 20.213321616 0. tcg_malloc 20.441130078 1.1270214 TCGContext 20.477846517 1.3086662 g_malloc 20.780527895 2.8061013 The other two alternatives shown in the table are: - TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps in TCGContext. This is simple enough but it isn't faster than using tcg_malloc; moreover, it wastes memory. - g_malloc: allocate/deallocate both temps and used_temps every time tcg_optimize is executed. Suggested-by: Richard Henderson Signed-off-by: Emilio G. Cota Signed-off-by: Richard Henderson Signed-off-by: Richard Henderson --- tcg/optimize.c | 306 ++++++++++++++++++++++++++++++--------------------------- 1 file changed, 161 insertions(+), 145 deletions(-) -- 2.13.6 diff --git a/tcg/optimize.c b/tcg/optimize.c index adfc56ce62..ce422e9ff0 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -40,21 +40,18 @@ struct tcg_temp_info { tcg_target_ulong mask; }; -static struct tcg_temp_info temps[TCG_MAX_TEMPS]; -static TCGTempSet temps_used; - -static inline bool temp_is_const(TCGArg arg) +static inline bool temp_is_const(const struct tcg_temp_info *temps, TCGArg arg) { return temps[arg].is_const; } -static inline bool temp_is_copy(TCGArg arg) +static inline bool temp_is_copy(const struct tcg_temp_info *temps, TCGArg arg) { return temps[arg].next_copy != arg; } /* Reset TEMP's state, possibly removing the temp for the list of copies. */ -static void reset_temp(TCGArg temp) +static void reset_temp(struct tcg_temp_info *temps, TCGArg temp) { temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy; temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy; @@ -64,21 +61,16 @@ static void reset_temp(TCGArg temp) temps[temp].mask = -1; } -/* Reset all temporaries, given that there are NB_TEMPS of them. */ -static void reset_all_temps(int nb_temps) -{ - bitmap_zero(temps_used.l, nb_temps); -} - /* Initialize and activate a temporary. */ -static void init_temp_info(TCGArg temp) +static void init_temp_info(struct tcg_temp_info *temps, + unsigned long *temps_used, TCGArg temp) { - if (!test_bit(temp, temps_used.l)) { + if (!test_bit(temp, temps_used)) { temps[temp].next_copy = temp; temps[temp].prev_copy = temp; temps[temp].is_const = false; temps[temp].mask = -1; - set_bit(temp, temps_used.l); + set_bit(temp, temps_used); } } @@ -116,7 +108,8 @@ static TCGOpcode op_to_movi(TCGOpcode op) } } -static TCGArg find_better_copy(TCGContext *s, TCGArg temp) +static TCGArg find_better_copy(TCGContext *s, const struct tcg_temp_info *temps, + TCGArg temp) { TCGArg i; @@ -145,7 +138,8 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg temp) return temp; } -static bool temps_are_copies(TCGArg arg1, TCGArg arg2) +static bool temps_are_copies(const struct tcg_temp_info *temps, TCGArg arg1, + TCGArg arg2) { TCGArg i; @@ -153,7 +147,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2) return true; } - if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) { + if (!temp_is_copy(temps, arg1) || !temp_is_copy(temps, arg2)) { return false; } @@ -166,15 +160,15 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2) return false; } -static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args, - TCGArg dst, TCGArg val) +static void tcg_opt_gen_movi(TCGContext *s, struct tcg_temp_info *temps, + TCGOp *op, TCGArg *args, TCGArg dst, TCGArg val) { TCGOpcode new_op = op_to_movi(op->opc); tcg_target_ulong mask; op->opc = new_op; - reset_temp(dst); + reset_temp(temps, dst); temps[dst].is_const = true; temps[dst].val = val; mask = val; @@ -188,10 +182,10 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args, args[1] = val; } -static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args, - TCGArg dst, TCGArg src) +static void tcg_opt_gen_mov(TCGContext *s, struct tcg_temp_info *temps, + TCGOp *op, TCGArg *args, TCGArg dst, TCGArg src) { - if (temps_are_copies(dst, src)) { + if (temps_are_copies(temps, dst, src)) { tcg_op_remove(s, op); return; } @@ -201,7 +195,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args, op->opc = new_op; - reset_temp(dst); + reset_temp(temps, dst); mask = temps[src].mask; if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) { /* High bits of the destination are now garbage. */ @@ -463,10 +457,11 @@ static bool do_constant_folding_cond_eq(TCGCond c) /* Return 2 if the condition can't be simplified, and the result of the condition (0 or 1) if it can */ -static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x, - TCGArg y, TCGCond c) +static TCGArg +do_constant_folding_cond(const struct tcg_temp_info *temps, TCGOpcode op, + TCGArg x, TCGArg y, TCGCond c) { - if (temp_is_const(x) && temp_is_const(y)) { + if (temp_is_const(temps, x) && temp_is_const(temps, y)) { switch (op_bits(op)) { case 32: return do_constant_folding_cond_32(temps[x].val, temps[y].val, c); @@ -475,9 +470,9 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x, default: tcg_abort(); } - } else if (temps_are_copies(x, y)) { + } else if (temps_are_copies(temps, x, y)) { return do_constant_folding_cond_eq(c); - } else if (temp_is_const(y) && temps[y].val == 0) { + } else if (temp_is_const(temps, y) && temps[y].val == 0) { switch (c) { case TCG_COND_LTU: return 0; @@ -492,15 +487,17 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x, /* Return 2 if the condition can't be simplified, and the result of the condition (0 or 1) if it can */ -static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c) +static TCGArg +do_constant_folding_cond2(const struct tcg_temp_info *temps, TCGArg *p1, + TCGArg *p2, TCGCond c) { TCGArg al = p1[0], ah = p1[1]; TCGArg bl = p2[0], bh = p2[1]; - if (temp_is_const(bl) && temp_is_const(bh)) { + if (temp_is_const(temps, bl) && temp_is_const(temps, bh)) { uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val; - if (temp_is_const(al) && temp_is_const(ah)) { + if (temp_is_const(temps, al) && temp_is_const(temps, ah)) { uint64_t a; a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val; return do_constant_folding_cond_64(a, b, c); @@ -516,18 +513,19 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c) } } } - if (temps_are_copies(al, bl) && temps_are_copies(ah, bh)) { + if (temps_are_copies(temps, al, bl) && temps_are_copies(temps, ah, bh)) { return do_constant_folding_cond_eq(c); } return 2; } -static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2) +static bool swap_commutative(const struct tcg_temp_info *temps, TCGArg dest, + TCGArg *p1, TCGArg *p2) { TCGArg a1 = *p1, a2 = *p2; int sum = 0; - sum += temp_is_const(a1); - sum -= temp_is_const(a2); + sum += temp_is_const(temps, a1); + sum -= temp_is_const(temps, a2); /* Prefer the constant in second argument, and then the form op a, a, b, which is better handled on non-RISC hosts. */ @@ -539,13 +537,14 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2) return false; } -static bool swap_commutative2(TCGArg *p1, TCGArg *p2) +static bool swap_commutative2(const struct tcg_temp_info *temps, TCGArg *p1, + TCGArg *p2) { int sum = 0; - sum += temp_is_const(p1[0]); - sum += temp_is_const(p1[1]); - sum -= temp_is_const(p2[0]); - sum -= temp_is_const(p2[1]); + sum += temp_is_const(temps, p1[0]); + sum += temp_is_const(temps, p1[1]); + sum -= temp_is_const(temps, p2[0]); + sum -= temp_is_const(temps, p2[1]); if (sum > 0) { TCGArg t; t = p1[0], p1[0] = p2[0], p2[0] = t; @@ -558,6 +557,8 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2) /* Propagate constants and copies, fold constant expressions. */ void tcg_optimize(TCGContext *s) { + struct tcg_temp_info *temps; + unsigned long *temps_used; int oi, oi_next, nb_temps, nb_globals; TCGArg *prev_mb_args = NULL; @@ -568,7 +569,9 @@ void tcg_optimize(TCGContext *s) nb_temps = s->nb_temps; nb_globals = s->nb_globals; - reset_all_temps(nb_temps); + temps = tcg_malloc(nb_temps * sizeof(*temps)); + temps_used = tcg_malloc(BITS_TO_LONGS(nb_temps) * sizeof(*temps_used)); + bitmap_zero(temps_used, nb_temps); for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) { tcg_target_ulong mask, partmask, affected; @@ -590,21 +593,21 @@ void tcg_optimize(TCGContext *s) for (i = 0; i < nb_oargs + nb_iargs; i++) { tmp = args[i]; if (tmp != TCG_CALL_DUMMY_ARG) { - init_temp_info(tmp); + init_temp_info(temps, temps_used, tmp); } } } else { nb_oargs = def->nb_oargs; nb_iargs = def->nb_iargs; for (i = 0; i < nb_oargs + nb_iargs; i++) { - init_temp_info(args[i]); + init_temp_info(temps, temps_used, args[i]); } } /* Do copy propagation */ for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) { - if (temp_is_copy(args[i])) { - args[i] = find_better_copy(s, args[i]); + if (temp_is_copy(temps, args[i])) { + args[i] = find_better_copy(s, temps, args[i]); } } @@ -620,44 +623,44 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(nor): CASE_OP_32_64(muluh): CASE_OP_32_64(mulsh): - swap_commutative(args[0], &args[1], &args[2]); + swap_commutative(temps, args[0], &args[1], &args[2]); break; CASE_OP_32_64(brcond): - if (swap_commutative(-1, &args[0], &args[1])) { + if (swap_commutative(temps, -1, &args[0], &args[1])) { args[2] = tcg_swap_cond(args[2]); } break; CASE_OP_32_64(setcond): - if (swap_commutative(args[0], &args[1], &args[2])) { + if (swap_commutative(temps, args[0], &args[1], &args[2])) { args[3] = tcg_swap_cond(args[3]); } break; CASE_OP_32_64(movcond): - if (swap_commutative(-1, &args[1], &args[2])) { + if (swap_commutative(temps, -1, &args[1], &args[2])) { args[5] = tcg_swap_cond(args[5]); } /* For movcond, we canonicalize the "false" input reg to match the destination reg so that the tcg backend can implement a "move if true" operation. */ - if (swap_commutative(args[0], &args[4], &args[3])) { + if (swap_commutative(temps, args[0], &args[4], &args[3])) { args[5] = tcg_invert_cond(args[5]); } break; CASE_OP_32_64(add2): - swap_commutative(args[0], &args[2], &args[4]); - swap_commutative(args[1], &args[3], &args[5]); + swap_commutative(temps, args[0], &args[2], &args[4]); + swap_commutative(temps, args[1], &args[3], &args[5]); break; CASE_OP_32_64(mulu2): CASE_OP_32_64(muls2): - swap_commutative(args[0], &args[2], &args[3]); + swap_commutative(temps, args[0], &args[2], &args[3]); break; case INDEX_op_brcond2_i32: - if (swap_commutative2(&args[0], &args[2])) { + if (swap_commutative2(temps, &args[0], &args[2])) { args[4] = tcg_swap_cond(args[4]); } break; case INDEX_op_setcond2_i32: - if (swap_commutative2(&args[1], &args[3])) { + if (swap_commutative2(temps, &args[1], &args[3])) { args[5] = tcg_swap_cond(args[5]); } break; @@ -673,8 +676,8 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(sar): CASE_OP_32_64(rotl): CASE_OP_32_64(rotr): - if (temp_is_const(args[1]) && temps[args[1]].val == 0) { - tcg_opt_gen_movi(s, op, args, args[0], 0); + if (temp_is_const(temps, args[1]) && temps[args[1]].val == 0) { + tcg_opt_gen_movi(s, temps, op, args, args[0], 0); continue; } break; @@ -683,7 +686,7 @@ void tcg_optimize(TCGContext *s) TCGOpcode neg_op; bool have_neg; - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { /* Proceed with possible constant folding. */ break; } @@ -697,9 +700,9 @@ void tcg_optimize(TCGContext *s) if (!have_neg) { break; } - if (temp_is_const(args[1]) && temps[args[1]].val == 0) { + if (temp_is_const(temps, args[1]) && temps[args[1]].val == 0) { op->opc = neg_op; - reset_temp(args[0]); + reset_temp(temps, args[0]); args[1] = args[2]; continue; } @@ -707,30 +710,30 @@ void tcg_optimize(TCGContext *s) break; CASE_OP_32_64(xor): CASE_OP_32_64(nand): - if (!temp_is_const(args[1]) - && temp_is_const(args[2]) && temps[args[2]].val == -1) { + if (!temp_is_const(temps, args[1]) + && temp_is_const(temps, args[2]) && temps[args[2]].val == -1) { i = 1; goto try_not; } break; CASE_OP_32_64(nor): - if (!temp_is_const(args[1]) - && temp_is_const(args[2]) && temps[args[2]].val == 0) { + if (!temp_is_const(temps, args[1]) + && temp_is_const(temps, args[2]) && temps[args[2]].val == 0) { i = 1; goto try_not; } break; CASE_OP_32_64(andc): - if (!temp_is_const(args[2]) - && temp_is_const(args[1]) && temps[args[1]].val == -1) { + if (!temp_is_const(temps, args[2]) + && temp_is_const(temps, args[1]) && temps[args[1]].val == -1) { i = 2; goto try_not; } break; CASE_OP_32_64(orc): CASE_OP_32_64(eqv): - if (!temp_is_const(args[2]) - && temp_is_const(args[1]) && temps[args[1]].val == 0) { + if (!temp_is_const(temps, args[2]) + && temp_is_const(temps, args[1]) && temps[args[1]].val == 0) { i = 2; goto try_not; } @@ -751,7 +754,7 @@ void tcg_optimize(TCGContext *s) break; } op->opc = not_op; - reset_temp(args[0]); + reset_temp(temps, args[0]); args[1] = args[i]; continue; } @@ -771,18 +774,18 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(or): CASE_OP_32_64(xor): CASE_OP_32_64(andc): - if (!temp_is_const(args[1]) - && temp_is_const(args[2]) && temps[args[2]].val == 0) { - tcg_opt_gen_mov(s, op, args, args[0], args[1]); + if (!temp_is_const(temps, args[1]) + && temp_is_const(temps, args[2]) && temps[args[2]].val == 0) { + tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]); continue; } break; CASE_OP_32_64(and): CASE_OP_32_64(orc): CASE_OP_32_64(eqv): - if (!temp_is_const(args[1]) - && temp_is_const(args[2]) && temps[args[2]].val == -1) { - tcg_opt_gen_mov(s, op, args, args[0], args[1]); + if (!temp_is_const(temps, args[1]) + && temp_is_const(temps, args[2]) && temps[args[2]].val == -1) { + tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]); continue; } break; @@ -819,7 +822,7 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(and): mask = temps[args[2]].mask; - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { and_const: affected = temps[args[1]].mask & ~mask; } @@ -838,7 +841,7 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(andc): /* Known-zeros does not imply known-ones. Therefore unless args[2] is constant, we can't infer anything from it. */ - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { mask = ~temps[args[2]].mask; goto and_const; } @@ -847,26 +850,26 @@ void tcg_optimize(TCGContext *s) break; case INDEX_op_sar_i32: - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { tmp = temps[args[2]].val & 31; mask = (int32_t)temps[args[1]].mask >> tmp; } break; case INDEX_op_sar_i64: - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { tmp = temps[args[2]].val & 63; mask = (int64_t)temps[args[1]].mask >> tmp; } break; case INDEX_op_shr_i32: - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { tmp = temps[args[2]].val & 31; mask = (uint32_t)temps[args[1]].mask >> tmp; } break; case INDEX_op_shr_i64: - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { tmp = temps[args[2]].val & 63; mask = (uint64_t)temps[args[1]].mask >> tmp; } @@ -880,7 +883,7 @@ void tcg_optimize(TCGContext *s) break; CASE_OP_32_64(shl): - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1); mask = temps[args[1]].mask << tmp; } @@ -976,12 +979,12 @@ void tcg_optimize(TCGContext *s) if (partmask == 0) { tcg_debug_assert(nb_oargs == 1); - tcg_opt_gen_movi(s, op, args, args[0], 0); + tcg_opt_gen_movi(s, temps, op, args, args[0], 0); continue; } if (affected == 0) { tcg_debug_assert(nb_oargs == 1); - tcg_opt_gen_mov(s, op, args, args[0], args[1]); + tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]); continue; } @@ -991,8 +994,8 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(mul): CASE_OP_32_64(muluh): CASE_OP_32_64(mulsh): - if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) { - tcg_opt_gen_movi(s, op, args, args[0], 0); + if ((temp_is_const(temps, args[2]) && temps[args[2]].val == 0)) { + tcg_opt_gen_movi(s, temps, op, args, args[0], 0); continue; } break; @@ -1004,8 +1007,8 @@ void tcg_optimize(TCGContext *s) switch (opc) { CASE_OP_32_64(or): CASE_OP_32_64(and): - if (temps_are_copies(args[1], args[2])) { - tcg_opt_gen_mov(s, op, args, args[0], args[1]); + if (temps_are_copies(temps, args[1], args[2])) { + tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]); continue; } break; @@ -1018,8 +1021,8 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(andc): CASE_OP_32_64(sub): CASE_OP_32_64(xor): - if (temps_are_copies(args[1], args[2])) { - tcg_opt_gen_movi(s, op, args, args[0], 0); + if (temps_are_copies(temps, args[1], args[2])) { + tcg_opt_gen_movi(s, temps, op, args, args[0], 0); continue; } break; @@ -1032,10 +1035,10 @@ void tcg_optimize(TCGContext *s) allocator where needed and possible. Also detect copies. */ switch (opc) { CASE_OP_32_64(mov): - tcg_opt_gen_mov(s, op, args, args[0], args[1]); + tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]); break; CASE_OP_32_64(movi): - tcg_opt_gen_movi(s, op, args, args[0], args[1]); + tcg_opt_gen_movi(s, temps, op, args, args[0], args[1]); break; CASE_OP_32_64(not): @@ -1051,9 +1054,9 @@ void tcg_optimize(TCGContext *s) case INDEX_op_extu_i32_i64: case INDEX_op_extrl_i64_i32: case INDEX_op_extrh_i64_i32: - if (temp_is_const(args[1])) { + if (temp_is_const(temps, args[1])) { tmp = do_constant_folding(opc, temps[args[1]].val, 0); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; @@ -1080,66 +1083,70 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(divu): CASE_OP_32_64(rem): CASE_OP_32_64(remu): - if (temp_is_const(args[1]) && temp_is_const(args[2])) { + if (temp_is_const(temps, args[1]) && + temp_is_const(temps, args[2])) { tmp = do_constant_folding(opc, temps[args[1]].val, temps[args[2]].val); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; CASE_OP_32_64(clz): CASE_OP_32_64(ctz): - if (temp_is_const(args[1])) { + if (temp_is_const(temps, args[1])) { TCGArg v = temps[args[1]].val; if (v != 0) { tmp = do_constant_folding(opc, v, 0); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); } else { - tcg_opt_gen_mov(s, op, args, args[0], args[2]); + tcg_opt_gen_mov(s, temps, op, args, args[0], args[2]); } break; } goto do_default; CASE_OP_32_64(deposit): - if (temp_is_const(args[1]) && temp_is_const(args[2])) { + if (temp_is_const(temps, args[1]) && + temp_is_const(temps, args[2])) { tmp = deposit64(temps[args[1]].val, args[3], args[4], temps[args[2]].val); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; CASE_OP_32_64(extract): - if (temp_is_const(args[1])) { + if (temp_is_const(temps, args[1])) { tmp = extract64(temps[args[1]].val, args[2], args[3]); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; CASE_OP_32_64(sextract): - if (temp_is_const(args[1])) { + if (temp_is_const(temps, args[1])) { tmp = sextract64(temps[args[1]].val, args[2], args[3]); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; CASE_OP_32_64(setcond): - tmp = do_constant_folding_cond(opc, args[1], args[2], args[3]); + tmp = do_constant_folding_cond(temps, opc, args[1], args[2], + args[3]); if (tmp != 2) { - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; CASE_OP_32_64(brcond): - tmp = do_constant_folding_cond(opc, args[0], args[1], args[2]); + tmp = do_constant_folding_cond(temps, opc, args[0], args[1], + args[2]); if (tmp != 2) { if (tmp) { - reset_all_temps(nb_temps); + bitmap_zero(temps_used, nb_temps); op->opc = INDEX_op_br; args[0] = args[3]; } else { @@ -1150,12 +1157,14 @@ void tcg_optimize(TCGContext *s) goto do_default; CASE_OP_32_64(movcond): - tmp = do_constant_folding_cond(opc, args[1], args[2], args[5]); + tmp = do_constant_folding_cond(temps, opc, args[1], args[2], + args[5]); if (tmp != 2) { - tcg_opt_gen_mov(s, op, args, args[0], args[4-tmp]); + tcg_opt_gen_mov(s, temps, op, args, args[0], args[4 - tmp]); break; } - if (temp_is_const(args[3]) && temp_is_const(args[4])) { + if (temp_is_const(temps, args[3]) && + temp_is_const(temps, args[4])) { tcg_target_ulong tv = temps[args[3]].val; tcg_target_ulong fv = temps[args[4]].val; TCGCond cond = args[5]; @@ -1174,8 +1183,10 @@ void tcg_optimize(TCGContext *s) case INDEX_op_add2_i32: case INDEX_op_sub2_i32: - if (temp_is_const(args[2]) && temp_is_const(args[3]) - && temp_is_const(args[4]) && temp_is_const(args[5])) { + if (temp_is_const(temps, args[2]) && + temp_is_const(temps, args[3]) && + temp_is_const(temps, args[4]) && + temp_is_const(temps, args[5])) { uint32_t al = temps[args[2]].val; uint32_t ah = temps[args[3]].val; uint32_t bl = temps[args[4]].val; @@ -1194,8 +1205,8 @@ void tcg_optimize(TCGContext *s) rl = args[0]; rh = args[1]; - tcg_opt_gen_movi(s, op, args, rl, (int32_t)a); - tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(a >> 32)); + tcg_opt_gen_movi(s, temps, op, args, rl, (int32_t)a); + tcg_opt_gen_movi(s, temps, op2, args2, rh, (int32_t)(a >> 32)); /* We've done all we need to do with the movi. Skip it. */ oi_next = op2->next; @@ -1204,7 +1215,8 @@ void tcg_optimize(TCGContext *s) goto do_default; case INDEX_op_mulu2_i32: - if (temp_is_const(args[2]) && temp_is_const(args[3])) { + if (temp_is_const(temps, args[2]) && + temp_is_const(temps, args[3])) { uint32_t a = temps[args[2]].val; uint32_t b = temps[args[3]].val; uint64_t r = (uint64_t)a * b; @@ -1214,8 +1226,8 @@ void tcg_optimize(TCGContext *s) rl = args[0]; rh = args[1]; - tcg_opt_gen_movi(s, op, args, rl, (int32_t)r); - tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(r >> 32)); + tcg_opt_gen_movi(s, temps, op, args, rl, (int32_t)r); + tcg_opt_gen_movi(s, temps, op2, args2, rh, (int32_t)(r >> 32)); /* We've done all we need to do with the movi. Skip it. */ oi_next = op2->next; @@ -1224,11 +1236,11 @@ void tcg_optimize(TCGContext *s) goto do_default; case INDEX_op_brcond2_i32: - tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]); + tmp = do_constant_folding_cond2(temps, &args[0], &args[2], args[4]); if (tmp != 2) { if (tmp) { do_brcond_true: - reset_all_temps(nb_temps); + bitmap_zero(temps_used, nb_temps); op->opc = INDEX_op_br; args[0] = args[5]; } else { @@ -1236,12 +1248,14 @@ void tcg_optimize(TCGContext *s) tcg_op_remove(s, op); } } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE) - && temp_is_const(args[2]) && temps[args[2]].val == 0 - && temp_is_const(args[3]) && temps[args[3]].val == 0) { + && temp_is_const(temps, args[2]) + && temps[args[2]].val == 0 + && temp_is_const(temps, args[3]) + && temps[args[3]].val == 0) { /* Simplify LT/GE comparisons vs zero to a single compare vs the high word of the input. */ do_brcond_high: - reset_all_temps(nb_temps); + bitmap_zero(temps_used, nb_temps); op->opc = INDEX_op_brcond_i32; args[0] = args[1]; args[1] = args[3]; @@ -1250,14 +1264,14 @@ void tcg_optimize(TCGContext *s) } else if (args[4] == TCG_COND_EQ) { /* Simplify EQ comparisons where one of the pairs can be simplified. */ - tmp = do_constant_folding_cond(INDEX_op_brcond_i32, + tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32, args[0], args[2], TCG_COND_EQ); if (tmp == 0) { goto do_brcond_false; } else if (tmp == 1) { goto do_brcond_high; } - tmp = do_constant_folding_cond(INDEX_op_brcond_i32, + tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32, args[1], args[3], TCG_COND_EQ); if (tmp == 0) { goto do_brcond_false; @@ -1265,7 +1279,7 @@ void tcg_optimize(TCGContext *s) goto do_default; } do_brcond_low: - reset_all_temps(nb_temps); + bitmap_zero(temps_used, nb_temps); op->opc = INDEX_op_brcond_i32; args[1] = args[2]; args[2] = args[4]; @@ -1273,14 +1287,14 @@ void tcg_optimize(TCGContext *s) } else if (args[4] == TCG_COND_NE) { /* Simplify NE comparisons where one of the pairs can be simplified. */ - tmp = do_constant_folding_cond(INDEX_op_brcond_i32, + tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32, args[0], args[2], TCG_COND_NE); if (tmp == 0) { goto do_brcond_high; } else if (tmp == 1) { goto do_brcond_true; } - tmp = do_constant_folding_cond(INDEX_op_brcond_i32, + tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32, args[1], args[3], TCG_COND_NE); if (tmp == 0) { goto do_brcond_low; @@ -1294,17 +1308,19 @@ void tcg_optimize(TCGContext *s) break; case INDEX_op_setcond2_i32: - tmp = do_constant_folding_cond2(&args[1], &args[3], args[5]); + tmp = do_constant_folding_cond2(temps, &args[1], &args[3], args[5]); if (tmp != 2) { do_setcond_const: - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE) - && temp_is_const(args[3]) && temps[args[3]].val == 0 - && temp_is_const(args[4]) && temps[args[4]].val == 0) { + && temp_is_const(temps, args[3]) + && temps[args[3]].val == 0 + && temp_is_const(temps, args[4]) + && temps[args[4]].val == 0) { /* Simplify LT/GE comparisons vs zero to a single compare vs the high word of the input. */ do_setcond_high: - reset_temp(args[0]); + reset_temp(temps, args[0]); temps[args[0]].mask = 1; op->opc = INDEX_op_setcond_i32; args[1] = args[2]; @@ -1313,14 +1329,14 @@ void tcg_optimize(TCGContext *s) } else if (args[5] == TCG_COND_EQ) { /* Simplify EQ comparisons where one of the pairs can be simplified. */ - tmp = do_constant_folding_cond(INDEX_op_setcond_i32, + tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32, args[1], args[3], TCG_COND_EQ); if (tmp == 0) { goto do_setcond_const; } else if (tmp == 1) { goto do_setcond_high; } - tmp = do_constant_folding_cond(INDEX_op_setcond_i32, + tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32, args[2], args[4], TCG_COND_EQ); if (tmp == 0) { goto do_setcond_high; @@ -1328,7 +1344,7 @@ void tcg_optimize(TCGContext *s) goto do_default; } do_setcond_low: - reset_temp(args[0]); + reset_temp(temps, args[0]); temps[args[0]].mask = 1; op->opc = INDEX_op_setcond_i32; args[2] = args[3]; @@ -1336,14 +1352,14 @@ void tcg_optimize(TCGContext *s) } else if (args[5] == TCG_COND_NE) { /* Simplify NE comparisons where one of the pairs can be simplified. */ - tmp = do_constant_folding_cond(INDEX_op_setcond_i32, + tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32, args[1], args[3], TCG_COND_NE); if (tmp == 0) { goto do_setcond_high; } else if (tmp == 1) { goto do_setcond_const; } - tmp = do_constant_folding_cond(INDEX_op_setcond_i32, + tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32, args[2], args[4], TCG_COND_NE); if (tmp == 0) { goto do_setcond_low; @@ -1360,8 +1376,8 @@ void tcg_optimize(TCGContext *s) if (!(args[nb_oargs + nb_iargs + 1] & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) { for (i = 0; i < nb_globals; i++) { - if (test_bit(i, temps_used.l)) { - reset_temp(i); + if (test_bit(i, temps_used)) { + reset_temp(temps, i); } } } @@ -1375,11 +1391,11 @@ void tcg_optimize(TCGContext *s) block, otherwise we only trash the output args. "mask" is the non-zero bits mask for the first output arg. */ if (def->flags & TCG_OPF_BB_END) { - reset_all_temps(nb_temps); + bitmap_zero(temps_used, nb_temps); } else { do_reset_output: for (i = 0; i < nb_oargs; i++) { - reset_temp(args[i]); + reset_temp(temps, args[i]); /* Save the corresponding known-zero bits mask for the first output argument (only one supported so far). */ if (i == 0) {