From patchwork Fri Oct 19 06:06:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 149229 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp2910937lji; Thu, 18 Oct 2018 23:17:17 -0700 (PDT) X-Google-Smtp-Source: ACcGV63gM4F0gD85KUOEXKfcBchn45wm7nP7sIenxbe5j4JOcSfjNKoy7cmJ/3oEqSYRzBjhCOEm X-Received: by 2002:ac8:3a82:: with SMTP id x2-v6mr32384584qte.342.1539929837378; Thu, 18 Oct 2018 23:17:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539929837; cv=none; d=google.com; s=arc-20160816; b=Ip6j/PvYkYM1sBYEUsRe/lB7/Adbsc7wSxY82Ry7jBFxl6qK1QEbI/L62KkOMpEUst vEhUI+AbAfFXRZz5cgkcA7MkGvDMYJTJz2q5ugkiv/tHM2wU+nhgGfwFBshuo3V4aA9A hojuy4+5kC0ArOEkswyow8j7MrceNHHr+rG8GBVob5IHDGWzLmTl1oRoOF1X96nSxKAr 8xlntTBma077jbIcOozYt8hGjj4FJGH+9ypdZK9oIrn2J4IoY0K/7nG7Hq2SIEKIlusc 0oVL9L8BGNtpK7a/Dph7U4Uglcdg+C6EzaGcx7N4bI7XFeiokJfbydpNEYcAbsHAL1/D Z1mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature; bh=UaQoW0X3h57lcIqJE5IM63G9fD5eqtJkNjrcpKzJ3Eo=; b=gsWv/O2FQbwG361NtzlZ+5/ii6ynqvqMOtcFrkur+5FU29Vgy71u3pMV060Tj8156D A/2tPiTvdMYnSgMuAaBLabNhVT1pPuCt6gRw/nozG/70ElaBKeS4Awyv24ZzDt9GezQQ 9I6YXwtVjMbzYCQZiznrU+4UkNUm/jqIzd2MK5GNbxDLknH0WN8oizTLkMuRmAT8rPxM 1eydrygLqCEkOtEiiRd356HMCb96aDkCvv5PnrRP/SHUv3Xy+Vld+7/6eHAsViDqULu8 T9s6wiGXRfXjutn0hQH2IutY8wsrMJwG5dRozFYJHlt7ak1XMZinazfL0I5l0DKY34+v aong== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=izYDI8yw; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id l10-v6si68926qtq.334.2018.10.18.23.17.17 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 18 Oct 2018 23:17:17 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=izYDI8yw; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:47270 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gDO60-0004if-Qs for patch@linaro.org; Fri, 19 Oct 2018 02:17:16 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39349) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gDNwl-0004Uo-LI for qemu-devel@nongnu.org; Fri, 19 Oct 2018 02:07:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gDNwg-0002nF-3K for qemu-devel@nongnu.org; Fri, 19 Oct 2018 02:07:43 -0400 Received: from mail-pg1-x533.google.com ([2607:f8b0:4864:20::533]:40766) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gDNwf-0002O4-FE for qemu-devel@nongnu.org; Fri, 19 Oct 2018 02:07:37 -0400 Received: by mail-pg1-x533.google.com with SMTP id n31-v6so15308166pgm.7 for ; Thu, 18 Oct 2018 23:07:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=UaQoW0X3h57lcIqJE5IM63G9fD5eqtJkNjrcpKzJ3Eo=; b=izYDI8ywLvyeC8CCsKI+ck2N6pFKUT8pZlx429kf4ONlf7mXU5T4Z8otqzLIWXzgNv JY+WCfPgqhfEm2qNypToB5l8muHUhS7J2aog60C7uVgg1BDm9RLVENvQKDrZzLy/C1gB 6W/kKNr82zK6Dgp51DOSEl76/LoHknYp59T7g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=UaQoW0X3h57lcIqJE5IM63G9fD5eqtJkNjrcpKzJ3Eo=; b=tuzyWGwOD6G38n0BYMpO1NiUfkIHpeY1m9nr4QWxyReQkhg1jMTyHx4A0w20YVgltY YIkhjvg9gq1Im/Pg/6HXtINyx3a0ayCbtf1zwO7CCoMPx+ZAZMuK19tVilLy5LmsAf0l f5U8KlhKOI1nxTxEDhKDlyicMMjRrF3iE9T7Il495hB1G57FqiBfOHrEErCltM3SkjE5 9nXj9Ifuwx8yajHSIUskqowlfVpMsdc7MXXCKfLG7qP5VqNm/uAnOFXwE+dVAvhhwUJF oyXLv7ZB1D9uquRpzGNSfpJ6bGMoZSkQcnLXUfU1ZHv9Dalza5jKuFgCJanBHI0nQzXH MjQA== X-Gm-Message-State: ABuFfojuyR8o2y+w03BNkBPAKt9yzZkmglAsUjSmPsTMuQxqHFjV7A1r VPIridZUxvFdLxUehp2kT6qQvdrqXWM= X-Received: by 2002:a62:7e81:: with SMTP id z123-v6mr33275389pfc.139.1539929247853; Thu, 18 Oct 2018 23:07:27 -0700 (PDT) Received: from cloudburst.twiddle.net (174-21-9-133.tukw.qwest.net. [174.21.9.133]) by smtp.gmail.com with ESMTPSA id q24-v6sm25609327pff.83.2018.10.18.23.07.26 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 18 Oct 2018 23:07:26 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 18 Oct 2018 23:06:56 -0700 Message-Id: <20181019060656.7968-22-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20181019060656.7968-1-richard.henderson@linaro.org> References: <20181019060656.7968-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::533 Subject: [Qemu-devel] [PULL v2 21/21] cputlb: read CPUTLBEntry.addr_write atomically X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, "Emilio G. Cota" Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" From: "Emilio G. Cota" Updates can come from other threads, so readers that do not take tlb_lock must use atomic_read to avoid undefined behaviour (UB). This completes the conversion to tlb_lock. This conversion results on average in no performance loss, as the following experiments (run on an Intel i7-6700K CPU @ 4.00GHz) show. 1. aarch64 bootup+shutdown test: - Before: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs): 7487.087786 task-clock (msec) # 0.998 CPUs utilized ( +- 0.12% ) 31,574,905,303 cycles # 4.217 GHz ( +- 0.12% ) 57,097,908,812 instructions # 1.81 insns per cycle ( +- 0.08% ) 10,255,415,367 branches # 1369.747 M/sec ( +- 0.08% ) 173,278,962 branch-misses # 1.69% of all branches ( +- 0.18% ) 7.504481349 seconds time elapsed ( +- 0.14% ) - After: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs): 7462.441328 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% ) 31,478,476,520 cycles # 4.218 GHz ( +- 0.07% ) 57,017,330,084 instructions # 1.81 insns per cycle ( +- 0.05% ) 10,251,929,667 branches # 1373.804 M/sec ( +- 0.05% ) 173,023,787 branch-misses # 1.69% of all branches ( +- 0.11% ) 7.474970463 seconds time elapsed ( +- 0.07% ) 2. SPEC06int: SPEC06int (test set) [Y axis: Speedup over master] 1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+ | | 1.1 +-+.................................+++.............................+ tlb-lock-v2 (m+++x) +-+ | +++ | +++ tlb-lock-v3 (spinl|ck) | | +++ | | +++ +++ | | | 1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+ | ### ++#| # |# |# ***### +++### +++#+# | +++ | #|# ### | 1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+ | *+* # #++# *** # #### *** # * *++# ****+# *| * # ****|# |# # #|# #+# # # | 0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+ | * * # # # *|* # # # *|* # * * # *++* # * * # * * # * |* # ++# # # # *** # | | * * # ++# # *+* # # # *|* # * * # * * # * * # * * # *++* # **** # ++# # * * # | 0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+ | * * # *** # * * # |# # *+* # * * # * * # * * # * * # * * # *++* # |# # * * # | 0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+ | * * # *+* # * * # *|* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # | | * * # * * # * * # *+* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # | 0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+ | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # | 0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+ 400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean png: https://imgur.com/a/BHzpPTW Notes: - tlb-lock-v2 corresponds to an implementation with a mutex. - tlb-lock-v3 corresponds to the current implementation, i.e. a spinlock and a single lock acquisition in tlb_set_page_with_attrs. Signed-off-by: Emilio G. Cota Message-Id: <20181016153840.25877-1-cota@braap.org> Signed-off-by: Richard Henderson --- accel/tcg/softmmu_template.h | 12 ++++++------ include/exec/cpu_ldst.h | 11 ++++++++++- include/exec/cpu_ldst_template.h | 2 +- accel/tcg/cputlb.c | 19 +++++++++++++------ 4 files changed, 30 insertions(+), 14 deletions(-) -- 2.17.2 diff --git a/accel/tcg/softmmu_template.h b/accel/tcg/softmmu_template.h index 09538b5349..b0adea045e 100644 --- a/accel/tcg/softmmu_template.h +++ b/accel/tcg/softmmu_template.h @@ -280,7 +280,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val, uintptr_t mmu_idx = get_mmuidx(oi); uintptr_t index = tlb_index(env, mmu_idx, addr); CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); - target_ulong tlb_addr = entry->addr_write; + target_ulong tlb_addr = tlb_addr_write(entry); unsigned a_bits = get_alignment_bits(get_memop(oi)); uintptr_t haddr; @@ -295,7 +295,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val, tlb_fill(ENV_GET_CPU(env), addr, DATA_SIZE, MMU_DATA_STORE, mmu_idx, retaddr); } - tlb_addr = entry->addr_write & ~TLB_INVALID_MASK; + tlb_addr = tlb_addr_write(entry) & ~TLB_INVALID_MASK; } /* Handle an IO access. */ @@ -325,7 +325,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val, cannot evict the first. */ page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK; entry2 = tlb_entry(env, mmu_idx, page2); - if (!tlb_hit_page(entry2->addr_write, page2) + if (!tlb_hit_page(tlb_addr_write(entry2), page2) && !VICTIM_TLB_HIT(addr_write, page2)) { tlb_fill(ENV_GET_CPU(env), page2, DATA_SIZE, MMU_DATA_STORE, mmu_idx, retaddr); @@ -358,7 +358,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val, uintptr_t mmu_idx = get_mmuidx(oi); uintptr_t index = tlb_index(env, mmu_idx, addr); CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); - target_ulong tlb_addr = entry->addr_write; + target_ulong tlb_addr = tlb_addr_write(entry); unsigned a_bits = get_alignment_bits(get_memop(oi)); uintptr_t haddr; @@ -373,7 +373,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val, tlb_fill(ENV_GET_CPU(env), addr, DATA_SIZE, MMU_DATA_STORE, mmu_idx, retaddr); } - tlb_addr = entry->addr_write & ~TLB_INVALID_MASK; + tlb_addr = tlb_addr_write(entry) & ~TLB_INVALID_MASK; } /* Handle an IO access. */ @@ -403,7 +403,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val, cannot evict the first. */ page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK; entry2 = tlb_entry(env, mmu_idx, page2); - if (!tlb_hit_page(entry2->addr_write, page2) + if (!tlb_hit_page(tlb_addr_write(entry2), page2) && !VICTIM_TLB_HIT(addr_write, page2)) { tlb_fill(ENV_GET_CPU(env), page2, DATA_SIZE, MMU_DATA_STORE, mmu_idx, retaddr); diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h index f54d91ff68..959068495a 100644 --- a/include/exec/cpu_ldst.h +++ b/include/exec/cpu_ldst.h @@ -126,6 +126,15 @@ extern __thread uintptr_t helper_retaddr; /* The memory helpers for tcg-generated code need tcg_target_long etc. */ #include "tcg.h" +static inline target_ulong tlb_addr_write(const CPUTLBEntry *entry) +{ +#if TCG_OVERSIZED_GUEST + return entry->addr_write; +#else + return atomic_read(&entry->addr_write); +#endif +} + /* Find the TLB index corresponding to the mmu_idx + address pair. */ static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx, target_ulong addr) @@ -439,7 +448,7 @@ static inline void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr, tlb_addr = tlbentry->addr_read; break; case 1: - tlb_addr = tlbentry->addr_write; + tlb_addr = tlb_addr_write(tlbentry); break; case 2: tlb_addr = tlbentry->addr_code; diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h index d21a0b59bf..0f061d47ef 100644 --- a/include/exec/cpu_ldst_template.h +++ b/include/exec/cpu_ldst_template.h @@ -177,7 +177,7 @@ glue(glue(glue(cpu_st, SUFFIX), MEMSUFFIX), _ra)(CPUArchState *env, addr = ptr; mmu_idx = CPU_MMU_INDEX; entry = tlb_entry(env, mmu_idx, addr); - if (unlikely(entry->addr_write != + if (unlikely(tlb_addr_write(entry) != (addr & (TARGET_PAGE_MASK | (DATA_SIZE - 1))))) { oi = make_memop_idx(SHIFT, mmu_idx); glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(env, addr, v, oi, diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 28b770a404..af57aca5e4 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -258,7 +258,7 @@ static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry, target_ulong page) { return tlb_hit_page(tlb_entry->addr_read, page) || - tlb_hit_page(tlb_entry->addr_write, page) || + tlb_hit_page(tlb_addr_write(tlb_entry), page) || tlb_hit_page(tlb_entry->addr_code, page); } @@ -855,7 +855,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry, tlb_fill(cpu, addr, size, MMU_DATA_STORE, mmu_idx, retaddr); entry = tlb_entry(env, mmu_idx, addr); - tlb_addr = entry->addr_write; + tlb_addr = tlb_addr_write(entry); if (!(tlb_addr & ~(TARGET_PAGE_MASK | TLB_RECHECK))) { /* RAM access */ uintptr_t haddr = addr + entry->addend; @@ -904,7 +904,14 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index, assert_cpu_is_self(ENV_GET_CPU(env)); for (vidx = 0; vidx < CPU_VTLB_SIZE; ++vidx) { CPUTLBEntry *vtlb = &env->tlb_v_table[mmu_idx][vidx]; - target_ulong cmp = *(target_ulong *)((uintptr_t)vtlb + elt_ofs); + target_ulong cmp; + + /* elt_ofs might correspond to .addr_write, so use atomic_read */ +#if TCG_OVERSIZED_GUEST + cmp = *(target_ulong *)((uintptr_t)vtlb + elt_ofs); +#else + cmp = atomic_read((target_ulong *)((uintptr_t)vtlb + elt_ofs)); +#endif if (cmp == page) { /* Found entry in victim tlb, swap tlb and iotlb. */ @@ -977,7 +984,7 @@ void probe_write(CPUArchState *env, target_ulong addr, int size, int mmu_idx, uintptr_t index = tlb_index(env, mmu_idx, addr); CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); - if (!tlb_hit(entry->addr_write, addr)) { + if (!tlb_hit(tlb_addr_write(entry), addr)) { /* TLB entry is for a different page */ if (!VICTIM_TLB_HIT(addr_write, addr)) { tlb_fill(ENV_GET_CPU(env), addr, size, MMU_DATA_STORE, @@ -995,7 +1002,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr, size_t mmu_idx = get_mmuidx(oi); uintptr_t index = tlb_index(env, mmu_idx, addr); CPUTLBEntry *tlbe = tlb_entry(env, mmu_idx, addr); - target_ulong tlb_addr = tlbe->addr_write; + target_ulong tlb_addr = tlb_addr_write(tlbe); TCGMemOp mop = get_memop(oi); int a_bits = get_alignment_bits(mop); int s_bits = mop & MO_SIZE; @@ -1026,7 +1033,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr, tlb_fill(ENV_GET_CPU(env), addr, 1 << s_bits, MMU_DATA_STORE, mmu_idx, retaddr); } - tlb_addr = tlbe->addr_write & ~TLB_INVALID_MASK; + tlb_addr = tlb_addr_write(tlbe) & ~TLB_INVALID_MASK; } /* Notice an IO access or a needs-MMU-lookup access */