From patchwork Mon Jan 28 15:58:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 156787 Delivered-To: patch@linaro.org Received: by 2002:a02:48:0:0:0:0:0 with SMTP id 69csp3617808jaa; Mon, 28 Jan 2019 08:25:41 -0800 (PST) X-Google-Smtp-Source: ALg8bN67ShcJxrPKO0PXRtzEdg7HI7JSYVu214r7GgrsujDxRHTtyZAyzi52u+8ShfXHRHNzZQ3R X-Received: by 2002:adf:cd0e:: with SMTP id w14mr23267765wrm.218.1548692741522; Mon, 28 Jan 2019 08:25:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548692741; cv=none; d=google.com; s=arc-20160816; b=aSVV2nb4ZINo7BNdWuNmY6BKpoyBD4xU0ezm5Zs3DbawjFYxS62a7ZehrcJTvtYW0Y /Qn8LNroHyY8p48Perq5RkMVuBMtlBVTAhnv93VmP6n5rsHiOYptzJQAOTPATUr3byti vZmSBtB9yzRjNEKOAvRcJ4HsCzUybwGwTrsezuzwOCYcjnMqm2UVjX3nIHjFPASIVQvR zun2LeOGzYABFUuNSHT3vl/HciSKixuZsAWlk7l+mstk7G5fanauJ7KcmVIVJBdyMdfG /piXIqgicpGGRXnmUA5djgR9aJSWYVZ6gyJZHCcH9Y5sZR29xoiuynyoEXOwQYnk+ckm oDTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature; bh=G3R6M8lo+egOdVDHfgVuN8I78ohbr7aGRLQnNFv5cCY=; b=VF2bddu/x3KgnpTgb1OAeEgJEV3qio66T9eh3XUlJCBDHsYmqCR3GDy+W7/0QDk8qY rH31+Y2rmlY9RdnkVn9GS+yAxu6XRJ+hZ3/ccf+OjbB8JNuoHYkSxk4JAowuV789A4rl VRw06RaQtovKwEEsljJz5pu5y5fGlJSo4krSKkHBnjt4atlohKPsqBqWVVVveMhwCYqi 58FRnD+BTLclISRL7KjzD0WidcO0M1hoP4n2EUoKs4XNgrv9turO2G1fSLSmUqmqJVU+ TO7n1gWjBaMA7yMPmj7int9URbfjPSihIODF8eor0wmsrJG4RDxBhrLgu5SyvDxJkU8A bfgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=BI8PyEot; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id i67si34910582wmi.164.2019.01.28.08.25.41 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 28 Jan 2019 08:25:41 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=BI8PyEot; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([127.0.0.1]:34517 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1go9jA-0007qz-9r for patch@linaro.org; Mon, 28 Jan 2019 11:25:40 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33630) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1go9K6-0005Dq-UG for qemu-devel@nongnu.org; Mon, 28 Jan 2019 10:59:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1go9K4-0007vO-Ib for qemu-devel@nongnu.org; Mon, 28 Jan 2019 10:59:46 -0500 Received: from mail-pl1-x642.google.com ([2607:f8b0:4864:20::642]:41075) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1go9K4-0007om-7N for qemu-devel@nongnu.org; Mon, 28 Jan 2019 10:59:44 -0500 Received: by mail-pl1-x642.google.com with SMTP id u6so7919431plm.8 for ; Mon, 28 Jan 2019 07:59:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=G3R6M8lo+egOdVDHfgVuN8I78ohbr7aGRLQnNFv5cCY=; b=BI8PyEot8BbJCc0GinTMC85x5fEgOgU38qSj2XRMrC4iaX0hllM5w7Rqn2eX6GmD9t OSzJtYxRiHCgCGszRUKreVsWEZ+oQf356KVDesWsnvclvFnL6HpR1BSXVP1zA1M9iJTJ RbJ6JLOWjnBNZ9apMKCRx522jkoKg0FeTX758= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=G3R6M8lo+egOdVDHfgVuN8I78ohbr7aGRLQnNFv5cCY=; b=mlrt5F3thSmKr15t7SGa14AhplILEaNvLQQumCwaPRUUM+wyaW1LblgLwH513iF+g5 plk95nYFble12ZO4WYooCn5q8SlwRIy3v2TA3OMqcGRqRTrewvNFfstG96lf67RDLBSa tg/a3lV2a8icr4NiobpInUD+ExQBLH8+dOnZfZjCdGioLdfTDT6Kt6QIABPfhO3R8+cQ feqUjVOTNgz8T9hh2j7iLO4NW4giqiEZIiKnnNSvpD+P2PGu0m7lR3vnxVzMn9KjcKgR YtLCDgId8gkrGC3BbtPQ54Q2iUIJhqLXwj1kRf13dmtm8oRFttq5aEpfvD8P7sIwSUwu rmLw== X-Gm-Message-State: AJcUukfw+KtevSUaWO/fMQQYyeROo3Ok3g7eYX2ShnJSWlTV8CisgCtF qZbTQc4y9K45BRYrJivIDjAsQawDHRU= X-Received: by 2002:a17:902:b48b:: with SMTP id y11mr21442992plr.200.1548691169074; Mon, 28 Jan 2019 07:59:29 -0800 (PST) Received: from cloudburst.twiddle.net (50-233-235-3-static.hfc.comcastbusiness.net. [50.233.235.3]) by smtp.gmail.com with ESMTPSA id p2sm45518687pfp.125.2019.01.28.07.59.27 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 28 Jan 2019 07:59:28 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 28 Jan 2019 07:58:57 -0800 Message-Id: <20190128155907.20607-14-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20190128155907.20607-1-richard.henderson@linaro.org> References: <20190128155907.20607-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::642 Subject: [Qemu-devel] [PULL 13/23] tcg/i386: enable dynamic TLB sizing X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, "Emilio G. Cota" Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" From: "Emilio G. Cota" As the following experiments show, this series is a net perf gain, particularly for memory-heavy workloads. Experiments are run on an Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz. 1. System boot + shudown, debian aarch64: - Before (v3.1.0): Performance counter stats for './die.sh v3.1.0' (10 runs): 9019.797015 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% ) 29,910,312,379 cycles # 3.316 GHz ( +- 0.14% ) 54,699,252,014 instructions # 1.83 insn per cycle ( +- 0.08% ) 10,061,951,686 branches # 1115.541 M/sec ( +- 0.08% ) 172,966,530 branch-misses # 1.72% of all branches ( +- 0.07% ) 9.084039051 seconds time elapsed ( +- 0.23% ) - After: Performance counter stats for './die.sh tlb-dyn-v5' (10 runs): 8624.084842 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% ) 28,556,123,404 cycles # 3.311 GHz ( +- 0.13% ) 51,755,089,512 instructions # 1.81 insn per cycle ( +- 0.05% ) 9,526,513,946 branches # 1104.641 M/sec ( +- 0.05% ) 166,578,509 branch-misses # 1.75% of all branches ( +- 0.19% ) 8.680540350 seconds time elapsed ( +- 0.24% ) That is, a 4.4% perf increase. 2. System boot + shutdown, ubuntu 18.04 x86_64: - Before (v3.1.0): 56100.574751 task-clock (msec) # 1.016 CPUs utilized ( +- 4.81% ) 200,745,466,128 cycles # 3.578 GHz ( +- 5.24% ) 431,949,100,608 instructions # 2.15 insn per cycle ( +- 5.65% ) 77,502,383,330 branches # 1381.490 M/sec ( +- 6.18% ) 844,681,191 branch-misses # 1.09% of all branches ( +- 3.82% ) 55.221556378 seconds time elapsed ( +- 5.01% ) - After: 56603.419540 task-clock (msec) # 1.019 CPUs utilized ( +- 10.19% ) 202,217,930,479 cycles # 3.573 GHz ( +- 10.69% ) 439,336,291,626 instructions # 2.17 insn per cycle ( +- 14.14% ) 80,538,357,447 branches # 1422.853 M/sec ( +- 16.09% ) 776,321,622 branch-misses # 0.96% of all branches ( +- 3.77% ) 55.549661409 seconds time elapsed ( +- 10.44% ) No improvement (within noise range). Note that for this workload, increasing the time window too much can lead to perf degradation, since it flushes the TLB *very* frequently. 3. x86_64 SPEC06int: x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 5.5 +------------------------------------------------------------------------+ | +-+ | 5 |-+.................+-+...............................tlb-dyn-v5.......+-| | * * | 4.5 |-+.................*.*................................................+-| | * * | 4 |-+.................*.*................................................+-| | * * | 3.5 |-+.................*.*................................................+-| | * * | 3 |-+......+-+*.......*.*................................................+-| | * * * * | 2.5 |-+......*..*.......*.*.................................+-+*...........+-| | * * * * * * | 2 |-+......*..*.......*.*.................................*..*...........+-| | * * * * * * +-+ | 1.5 |-+......*..*.......*.*.................................*..*.*+-+.*+-+.+-| | * * *+-+ * * +-+ *+-+ +-+ +-+ * * * * * * | 1 |++++-+*+*++*+*++*++*+*++*+*+++-+*+*+-++*+-++++-++++-+++*++*+*++*+*++*+++| | * * * * * * * * * * * * * * * * * * * * * * * * * * | 0.5 +------------------------------------------------------------------------+ 400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeomean png: https://imgur.com/YRF90f7 That is, a 1.51x average speedup over the baseline, with a max speedup of 5.17x. Here's a different look at the SPEC06int results, using KVM as the baseline: x86_64-softmmu slowdown vs. KVM for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 25 +---------------------------------------------------------------------------+ | +-+ +-+ | | * * +-+ v3.1.0 | | * * +-+ tlb-dyn-v5 | | * * * * +-+ | 20 |-+.................*.*.............................*.+-+......*.*........+-| | * * * # # * * | | +-+ * * * # # * * | | * * * * * # # * * | 15 |-+......*.*........*.*.............................*.#.#......*.+-+......+-| | * * * * * # # * #|# | | * * * * +-+ * # # * +-+ | | * * +-+ * * ++-+ +-+ * # # * # # +-+ | | * * +-+ * * * ## *| +-+ * # # * # # +-+ | 10 |-+......*.*..*.+-+.*.*........*.##.......++-+.*.+-+*.#.#......*.#.#.*.*..+-| | * * * +-+ * * * ## +-+ *# # * # #* # # +-+ * # # * * | | * * * # # * * +-+ * ## * +-+ *# # * # #* # # * * * # # *+-+ | | * * * # # * * * +-+ * ## * # # *# # * # #* # # * * * # # * ## | 5 |-+......*.+-+*.#.#.*.*..*.#.#.*.##.*.#.#.*#.#.*.#.#*.#.#.*.*..*.#.#.*.##.+-| | * # #* # # * +-+* # # * ## * # # *# # * # #* # # * * * # # * ## | | * # #* # # * # #* # # * ## * # # *# # * # #* # # * +-+* # # * ## | | ++-+ * # #* # # * # #* # # * ## * # # *# # * # #* # # * # #* # # * ## | |+++*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+*+#+#+*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+++| 0 +---------------------------------------------------------------------------+ 400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmgeomean png: https://imgur.com/YzAMNEV After this series, we bring down the average SPEC06int slowdown vs KVM from 11.47x to 7.58x. Tested-by: Alex Bennée Reviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota Message-Id: <20190116170114.26802-4-cota@braap.org> Signed-off-by: Richard Henderson --- tcg/i386/tcg-target.h | 2 +- tcg/i386/tcg-target.inc.c | 28 ++++++++++++++-------------- 2 files changed, 15 insertions(+), 15 deletions(-) -- 2.17.2 diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index 94aa4cef7c..eb40312e67 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -27,7 +27,7 @@ #define TCG_TARGET_INSN_UNIT_SIZE 1 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 31 -#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 1 #ifdef __x86_64__ # define TCG_TARGET_REG_BITS 64 diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index dbf8253718..4d84aea3a9 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -329,6 +329,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_ARITH_GvEv (0x03) /* ... plus (ARITH_FOO << 3) */ #define OPC_ANDN (0xf2 | P_EXT38) #define OPC_ADD_GvEv (OPC_ARITH_GvEv | (ARITH_ADD << 3)) +#define OPC_AND_GvEv (OPC_ARITH_GvEv | (ARITH_AND << 3)) #define OPC_BLENDPS (0x0c | P_EXT3A | P_DATA16) #define OPC_BSF (0xbc | P_EXT) #define OPC_BSR (0xbd | P_EXT) @@ -1641,7 +1642,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi, } if (TCG_TYPE_PTR == TCG_TYPE_I64) { hrexw = P_REXW; - if (TARGET_PAGE_BITS + CPU_TLB_BITS > 32) { + if (TARGET_PAGE_BITS + CPU_TLB_DYN_MAX_BITS > 32) { tlbtype = TCG_TYPE_I64; tlbrexw = P_REXW; } @@ -1649,6 +1650,15 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi, } tcg_out_mov(s, tlbtype, r0, addrlo); + tcg_out_shifti(s, SHIFT_SHR + tlbrexw, r0, + TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS); + + tcg_out_modrm_offset(s, OPC_AND_GvEv + trexw, r0, TCG_AREG0, + offsetof(CPUArchState, tlb_mask[mem_index])); + + tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r0, TCG_AREG0, + offsetof(CPUArchState, tlb_table[mem_index])); + /* If the required alignment is at least as large as the access, simply copy the address and mask. For lesser alignments, check that we don't cross pages for the complete access. */ @@ -1658,20 +1668,10 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi, tcg_out_modrm_offset(s, OPC_LEA + trexw, r1, addrlo, s_mask - a_mask); } tlb_mask = (target_ulong)TARGET_PAGE_MASK | a_mask; - - tcg_out_shifti(s, SHIFT_SHR + tlbrexw, r0, - TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS); - tgen_arithi(s, ARITH_AND + trexw, r1, tlb_mask, 0); - tgen_arithi(s, ARITH_AND + tlbrexw, r0, - (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS, 0); - - tcg_out_modrm_sib_offset(s, OPC_LEA + hrexw, r0, TCG_AREG0, r0, 0, - offsetof(CPUArchState, tlb_table[mem_index][0]) - + which); /* cmp 0(r0), r1 */ - tcg_out_modrm_offset(s, OPC_CMP_GvEv + trexw, r1, r0, 0); + tcg_out_modrm_offset(s, OPC_CMP_GvEv + trexw, r1, r0, which); /* Prepare for both the fast path add of the tlb addend, and the slow path function argument setup. */ @@ -1684,7 +1684,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi, if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) { /* cmp 4(r0), addrhi */ - tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, r0, 4); + tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, r0, which + 4); /* jne slow_path */ tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0); @@ -1696,7 +1696,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi, /* add addend(r0), r1 */ tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r1, r0, - offsetof(CPUTLBEntry, addend) - which); + offsetof(CPUTLBEntry, addend)); } /*