From patchwork Wed Nov 9 14:57:40 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Alex_Benn=C3=A9e?= X-Patchwork-Id: 81522 Delivered-To: patch@linaro.org Received: by 10.140.97.165 with SMTP id m34csp247868qge; Wed, 9 Nov 2016 07:15:01 -0800 (PST) X-Received: by 10.157.33.90 with SMTP id l26mr71494otd.220.1478704501296; Wed, 09 Nov 2016 07:15:01 -0800 (PST) Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id k39si31227otb.173.2016.11.09.07.15.01 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 09 Nov 2016 07:15:01 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:40504 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c4Uaa-0005ma-H3 for patch@linaro.org; Wed, 09 Nov 2016 10:15:00 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41202) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c4UKK-0000hU-Qv for qemu-devel@nongnu.org; Wed, 09 Nov 2016 09:58:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c4UKH-0002KR-Q1 for qemu-devel@nongnu.org; Wed, 09 Nov 2016 09:58:12 -0500 Received: from mail-wm0-x229.google.com ([2a00:1450:400c:c09::229]:37111) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c4UKH-0002K3-Fq for qemu-devel@nongnu.org; Wed, 09 Nov 2016 09:58:09 -0500 Received: by mail-wm0-x229.google.com with SMTP id t79so305619755wmt.0 for ; Wed, 09 Nov 2016 06:58:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kzVMnRoETrmea/3wMDuG9nlh6yKpa67QBetZZs6kuOk=; b=emLGwMaqLSJmxZJ7suJckyZOn2ZdPjV+WvDBA4hn33FANXDPj0/M/paGJCF/LnzE/6 loavUlSKMW6lC2+nEIPbJFINdnH4gLUk88v3jqApsmzwOepJZP05agLd913m83OYwRLw E7Jy7erW4C+YVFcFEvR5jfOwHXfU/OeKOFPpY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kzVMnRoETrmea/3wMDuG9nlh6yKpa67QBetZZs6kuOk=; b=EVXt+Uf7D2Y1gPvqYLBnfd1sh5D2ojDqXyLYa6LNBmOdbIhEa61Ipf40yNjTgy02BO x1dOauIk5IXyQi82bLrU/b2qvNRuxyaHaUni7+a0KYGtmciAhOGyTue2I+dzaMYteQ6a DevYqG+kEJNH/fdTfjOaISijpjcM9Y98wA8MIMK8i1J9NzyW7Iq/l9klddmhm3LPD7iT MUl9pYGtsJglYB7TCxxcUluCMpl+90w3Tb9jexfg3QakZlWXewgXA1pVj9vIJ6fWi8lG a2DOLy5F6nWVJ94SOMUnFbgu/CYO7q+NXT2EKpWYjiGRXIGCHkvHNYsSLinmD0t+PXSa 9IMw== X-Gm-Message-State: ABUngvdW1MuLt6U4GJH86XWmW7pXEtO8DlHR9Ru5TbM1dQD0+Lau/osp+yjFp67EEzXPTo03 X-Received: by 10.28.158.209 with SMTP id h200mr21891257wme.54.1478703488125; Wed, 09 Nov 2016 06:58:08 -0800 (PST) Received: from zen.linaro.local ([81.128.185.34]) by smtp.gmail.com with ESMTPSA id i132sm7423757wmf.14.2016.11.09.06.58.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Nov 2016 06:58:03 -0800 (PST) Received: from zen.linaroharston (localhost [127.0.0.1]) by zen.linaro.local (Postfix) with ESMTP id A0F6D3E044B; Wed, 9 Nov 2016 14:57:57 +0000 (GMT) From: =?UTF-8?q?Alex=20Benn=C3=A9e?= To: pbonzini@redhat.com Date: Wed, 9 Nov 2016 14:57:40 +0000 Message-Id: <20161109145748.27282-12-alex.bennee@linaro.org> X-Mailer: git-send-email 2.10.1 In-Reply-To: <20161109145748.27282-1-alex.bennee@linaro.org> References: <20161109145748.27282-1-alex.bennee@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a00:1450:400c:c09::229 Subject: [Qemu-devel] [PATCH v6 11/19] cputlb: introduce tlb_flush_* async work. X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mttcg@listserver.greensocs.com, peter.maydell@linaro.org, claudio.fontana@huawei.com, nikunj@linux.vnet.ibm.com, Peter Crosthwaite , jan.kiszka@siemens.com, mark.burton@greensocs.com, a.rigo@virtualopensystems.com, qemu-devel@nongnu.org, cota@braap.org, serge.fdrv@gmail.com, bobby.prani@gmail.com, rth@twiddle.net, =?UTF-8?q?Alex=20Benn=C3=A9e?= , fred.konrad@greensocs.com Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" From: KONRAD Frederic Some architectures allow to flush the tlb of other VCPUs. This is not a problem when we have only one thread for all VCPUs but it definitely needs to be an asynchronous work when we are in true multithreaded work. We take the tb_lock() when doing this to avoid racing with other threads which may be invalidating TB's at the same time. The alternative would be to use proper atomic primitives to clear the tlb entries en-mass. This patch doesn't do anything to protect other cputlb function being called in MTTCG mode making cross vCPU changes. Signed-off-by: KONRAD Frederic [AJB: remove need for g_malloc on defer, make check fixes, tb_lock] Signed-off-by: Alex Bennée --- v6 (base patches) - don't use cmpxchg_bool (we drop it later anyway) - use RUN_ON_CPU macros instead of inlines - bug out of tlb_flush if !tcg_enabled() (MacOSX make check failure) v5 (base patches) - take tb_lock() for memset - ensure tb_flush_page properly asyncs work for other vCPUs - use run_on_cpu_data v4 (base_patches) - brought forward from arm enabling series - restore pending_tlb_flush flag v1 - Remove tlb_flush_all just do the check in tlb_flush. - remove the need to g_malloc - tlb_flush calls direct if !cpu->created fixup! cputlb: introduce tlb_flush_* async work. --- cputlb.c | 90 +++++++++++++++++++++++++++++++++++++++++-------- include/exec/exec-all.h | 1 + include/qom/cpu.h | 6 ++++ 3 files changed, 83 insertions(+), 14 deletions(-) -- 2.10.1 Reviewed-by: Richard Henderson diff --git a/cputlb.c b/cputlb.c index 30c7c37..d75bf8f 100644 --- a/cputlb.c +++ b/cputlb.c @@ -64,28 +64,29 @@ } \ } while (0) +/* run_on_cpu_data.target_ptr should always be big enough for a + * target_ulong even on 32 bit builds */ +QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_cpu_data)); + /* statistics */ int tlb_flush_count; -/* NOTE: - * If flush_global is true (the usual case), flush all tlb entries. - * If flush_global is false, flush (at least) all tlb entries not - * marked global. - * - * Since QEMU doesn't currently implement a global/not-global flag - * for tlb entries, at the moment tlb_flush() will also flush all - * tlb entries in the flush_global == false case. This is OK because - * CPU architectures generally permit an implementation to drop - * entries from the TLB at any time, so flushing more entries than - * required is only an efficiency issue, not a correctness issue. - */ -void tlb_flush(CPUState *cpu, int flush_global) +static void tlb_flush_nocheck(CPUState *cpu, int flush_global) { CPUArchState *env = cpu->env_ptr; + /* The QOM tests will trigger tlb_flushes without setting up TCG + * so we bug out here in that case. + */ + if (!tcg_enabled()) { + return; + } + assert_cpu_is_self(cpu); tlb_debug("(%d)\n", flush_global); + tb_lock(); + memset(env->tlb_table, -1, sizeof(env->tlb_table)); memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table)); memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache)); @@ -94,6 +95,39 @@ void tlb_flush(CPUState *cpu, int flush_global) env->tlb_flush_addr = -1; env->tlb_flush_mask = 0; tlb_flush_count++; + + tb_unlock(); + + atomic_mb_set(&cpu->pending_tlb_flush, false); +} + +static void tlb_flush_global_async_work(CPUState *cpu, run_on_cpu_data data) +{ + tlb_flush_nocheck(cpu, data.host_int); +} + +/* NOTE: + * If flush_global is true (the usual case), flush all tlb entries. + * If flush_global is false, flush (at least) all tlb entries not + * marked global. + * + * Since QEMU doesn't currently implement a global/not-global flag + * for tlb entries, at the moment tlb_flush() will also flush all + * tlb entries in the flush_global == false case. This is OK because + * CPU architectures generally permit an implementation to drop + * entries from the TLB at any time, so flushing more entries than + * required is only an efficiency issue, not a correctness issue. + */ +void tlb_flush(CPUState *cpu, int flush_global) +{ + if (cpu->created && !qemu_cpu_is_self(cpu)) { + if (atomic_cmpxchg(&cpu->pending_tlb_flush, false, true) == true) { + async_run_on_cpu(cpu, tlb_flush_global_async_work, + RUN_ON_CPU_HOST_INT(flush_global)); + } + } else { + tlb_flush_nocheck(cpu, flush_global); + } } static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp) @@ -103,6 +137,8 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp) assert_cpu_is_self(cpu); tlb_debug("start\n"); + tb_lock(); + for (;;) { int mmu_idx = va_arg(argp, int); @@ -117,6 +153,8 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp) } memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache)); + + tb_unlock(); } void tlb_flush_by_mmuidx(CPUState *cpu, ...) @@ -139,13 +177,15 @@ static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr) } } -void tlb_flush_page(CPUState *cpu, target_ulong addr) +static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data) { CPUArchState *env = cpu->env_ptr; + target_ulong addr = (target_ulong) data.target_ptr; int i; int mmu_idx; assert_cpu_is_self(cpu); + tlb_debug("page :" TARGET_FMT_lx "\n", addr); /* Check if we need to flush due to large pages. */ @@ -175,6 +215,18 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr) tb_flush_jmp_cache(cpu, addr); } +void tlb_flush_page(CPUState *cpu, target_ulong addr) +{ + tlb_debug("page :" TARGET_FMT_lx "\n", addr); + + if (!qemu_cpu_is_self(cpu)) { + async_run_on_cpu(cpu, tlb_flush_page_async_work, + RUN_ON_CPU_TARGET_PTR(addr)); + } else { + tlb_flush_page_async_work(cpu, RUN_ON_CPU_TARGET_PTR(addr)); + } +} + void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...) { CPUArchState *env = cpu->env_ptr; @@ -221,6 +273,16 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...) tb_flush_jmp_cache(cpu, addr); } +void tlb_flush_page_all(target_ulong addr) +{ + CPUState *cpu; + + CPU_FOREACH(cpu) { + async_run_on_cpu(cpu, tlb_flush_page_async_work, + RUN_ON_CPU_TARGET_PTR(addr)); + } +} + /* update the TLBs so that writes to code in the virtual page 'addr' can be detected */ void tlb_protect_code(ram_addr_t ram_addr) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 37781e0..e4f7839 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -160,6 +160,7 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr, void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr); void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx, uintptr_t retaddr); +void tlb_flush_page_all(target_ulong addr); #else static inline void tlb_flush_page(CPUState *cpu, target_ulong addr) { diff --git a/include/qom/cpu.h b/include/qom/cpu.h index 1735374..880ba42 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -393,6 +393,12 @@ struct CPUState { (absolute value) offset as small as possible. This reduces code size, especially for hosts without large memory offsets. */ uint32_t tcg_exit_req; + + /* The pending_tlb_flush flag is set and cleared atomically to + * avoid potential races. The aim of the flag is to avoid + * unnecessary flushes. + */ + bool pending_tlb_flush; }; QTAILQ_HEAD(CPUTailQ, CPUState);