From patchwork Fri Oct 5 15:48:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Alex_Benn=C3=A9e?= X-Patchwork-Id: 148202 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp618236lji; Fri, 5 Oct 2018 08:56:03 -0700 (PDT) X-Google-Smtp-Source: ACcGV622QSV+hu3YVcxHLB3mbajxsdhQSiwa5m/ZCIJnmBraPcsELMGPnXLbHF8gXVv9aHVWM362 X-Received: by 2002:a37:3142:: with SMTP id x63-v6mr9384923qkx.31.1538754963619; Fri, 05 Oct 2018 08:56:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538754963; cv=none; d=google.com; s=arc-20160816; b=FQ5T0Sjqb4yPNEmmt1zdg3Hm+I89+mC/YkTi1lvp6T7MluEyC6H54YjoqAt0jvTHHb QWXUyDGZC6xe8yctZga9hkBL6iWqARKs8LAtp37GU7uWwMsoEu/5r0xO1GqeV58+bDkN rk6nIVau20PK4YqDN8me7BVxmjSm2/CugCK/GnUh8Sn3rla5Ak5cKCDSBl4UpIldw4Gz ETW60l9XEOYV45hOAoh1lC1L8H/FSTvKzbP7yrzo4c31eiEplMEvUfVZ+4G1GvrVfzCt jyasig3nRrUhiWbl2ka3JxVzdEKu9nBDEY5zyxjCLmZJVUXEbFCcHY8i/CjCPqlL/0bL WEmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature; bh=bLpx1jC232NRFw8QVwC+x1aV+JfpOl9ciTf5Gki7OCE=; b=YJQz2JDntr5PB+QMXuKlxPQCTD6QPUF8AVirsFNvGN6ULldY6pl2QicSg1fDXoO46x LpZ5ivknc7c+J7yJHAi/Vbrr/xD4sOPPQhtD2PrfbKDKz9fcY0Z+wwDhO9TokZutzlCx 1gB+JCHylt2dU/3CPlO5ZDNqWM2G2AWrddcbutDFOkDkEPvwmE5PaS06OSm7omHd/gHS REWe4YTqlHqmjRnSAnRyAL8aHDPz08eNnG0cZlsAxj9sO7YG2ZYbdwpxZJG5INYc63j/ uGuDw/DeNdFlUN3S0CdOKpgzZP962z+22027yaeoHswjDKyXag3IpL98jiCyq1FaoMb2 jBQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=cvOid4wX; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id o32-v6si757836qtf.75.2018.10.05.08.56.03 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Oct 2018 08:56:03 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=cvOid4wX; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:35734 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g8SSR-0005kc-1Z for patch@linaro.org; Fri, 05 Oct 2018 11:56:03 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56998) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g8SMB-0006bn-Nw for qemu-devel@nongnu.org; Fri, 05 Oct 2018 11:49:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g8SM8-0006Br-6D for qemu-devel@nongnu.org; Fri, 05 Oct 2018 11:49:35 -0400 Received: from mail-wm1-x32a.google.com ([2a00:1450:4864:20::32a]:56268) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1g8SM7-000637-MH for qemu-devel@nongnu.org; Fri, 05 Oct 2018 11:49:31 -0400 Received: by mail-wm1-x32a.google.com with SMTP id 206-v6so2310015wmb.5 for ; Fri, 05 Oct 2018 08:49:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bLpx1jC232NRFw8QVwC+x1aV+JfpOl9ciTf5Gki7OCE=; b=cvOid4wX/fTBLJzvJSx8sPFwia6ThGsZ9FazJAcox88OS/F9hPgtfVO/hB4hFohDpp A3Mz1AUojFWFRUc+kzgNxkk6Bne/7jLOmKxaIs/j5hstkC40cZix3V1XZSFXrglbuqpl 9Lw/WLPNcpowjcAj3LFPYRJMcl8bkZHlZLt+o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bLpx1jC232NRFw8QVwC+x1aV+JfpOl9ciTf5Gki7OCE=; b=qsVayUJ1YW8JLtGv0iStwtFRHZ/KhFZkWcbxOpAR9X1S3YKDZYHC83j8+z5atYGywE MFWEH3WrRQYM2SM55JeKoLp5KafHtezvEEOr3vnDqypdqOpaDpqQLA1b3iG7tgc0HUvU vsO41/tVXKKiyVY0JXxbCxo31q547LlmCq7MWNca05BlcftpO+atW+qKjIZLc2zny8E3 pHEOe+zfUaKKplR7wx+FhqMGLp3kkGRV7QyEaDRWs4bLieh/SdL141X4G0hRMkp3zvqm 8se4mc1CwHGXxhjiFX44DfWjSgcC9AIiLhuZv5lI38fPurA5x3xMevIAZO+fu2qRyNHm np/Q== X-Gm-Message-State: ABuFfoiuk2XAEzniUbVmTdGeB9FaD/Equfq4MMIMbwzst7nLPcsMxV9i 4Pv2lfuJc4HZlBWf3vJspsc+9A== X-Received: by 2002:a1c:9355:: with SMTP id v82-v6mr8150952wmd.128.1538754558717; Fri, 05 Oct 2018 08:49:18 -0700 (PDT) Received: from zen.linaro.local ([81.128.185.34]) by smtp.gmail.com with ESMTPSA id v1-v6sm7440400wrd.24.2018.10.05.08.49.13 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Oct 2018 08:49:15 -0700 (PDT) Received: from zen.linaroharston (localhost [127.0.0.1]) by zen.linaro.local (Postfix) with ESMTP id 477A33E0631; Fri, 5 Oct 2018 16:49:11 +0100 (BST) From: =?utf-8?q?Alex_Benn=C3=A9e?= To: qemu-devel@nongnu.org Date: Fri, 5 Oct 2018 16:48:56 +0100 Message-Id: <20181005154910.3099-8-alex.bennee@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181005154910.3099-1-alex.bennee@linaro.org> References: <20181005154910.3099-1-alex.bennee@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::32a Subject: [Qemu-devel] [RFC PATCH 07/21] accel/tcg/cputlb: convert tlb_flush debugging into trace events X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Crosthwaite , cota@braap.org, Pavel.Dovgaluk@ispras.ru, Paolo Bonzini , =?utf-8?q?Alex_Benn=C3=A9e?= , vilanova@ac.upc.edu, Richard Henderson Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Given the range of costs for various SoftMMU TLB operations from deferring work for the currently running vCPU to bring the whole emulated machine to a stop for synchronised updates simple counters are less useful. Instead we log events via the trace infrastructure and we can then post-process the data in a range of ways. tlb_flush_self - the vCPU flushed its own TLB tlb_flush_async_schedule - work was scheduled and the vCPU kicked tlb_flush_synced_schedule - exclusive work was scheduled on a vCPU tlb_flush_work - scheduled work was done tlb_flush_work_complete - scheduled work was completed We can use the difference between the work being scheduled and tlb_flush_work to calculate the latency introduced. Signed-off-by: Alex Bennée --- v2 - add tlb_flush_work_completed - removed tlb_debugs at the same time --- accel/tcg/cputlb.c | 72 ++++++++++++++++++++++++------------------ accel/tcg/trace-events | 16 ++++++++++ 2 files changed, 58 insertions(+), 30 deletions(-) -- 2.17.1 diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index f4702ce91f..f1d4f7da44 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -20,6 +20,7 @@ #include "qemu/osdep.h" #include "qemu/main-loop.h" #include "cpu.h" +#include "trace.h" #include "exec/exec-all.h" #include "exec/memory.h" #include "exec/address-spaces.h" @@ -138,7 +139,9 @@ static void tlb_flush_nocheck(CPUState *cpu) static void tlb_flush_global_async_work(CPUState *cpu, run_on_cpu_data data) { + trace_tlb_flush_all_work(cpu->cpu_index); tlb_flush_nocheck(cpu); + trace_tlb_flush_work_complete(__LINE__, cpu->cpu_index); } void tlb_flush(CPUState *cpu) @@ -146,17 +149,22 @@ void tlb_flush(CPUState *cpu) if (cpu->created && !qemu_cpu_is_self(cpu)) { if (atomic_mb_read(&cpu->pending_tlb_flush) != ALL_MMUIDX_BITS) { atomic_mb_set(&cpu->pending_tlb_flush, ALL_MMUIDX_BITS); + trace_tlb_flush_async_schedule(__LINE__, current_cpu ? + current_cpu->cpu_index : + cpu->cpu_index, cpu->cpu_index); async_run_on_cpu(cpu, tlb_flush_global_async_work, RUN_ON_CPU_NULL); } } else { - tlb_flush_nocheck(cpu); + trace_tlb_flush_self(__LINE__, cpu->cpu_index); + tlb_flush_global_async_work(cpu, RUN_ON_CPU_NULL); } } void tlb_flush_all_cpus(CPUState *src_cpu) { const run_on_cpu_func fn = tlb_flush_global_async_work; + trace_tlb_flush_async_schedule(__LINE__, src_cpu->cpu_index, -1); flush_all_helper(src_cpu, fn, RUN_ON_CPU_NULL); fn(src_cpu, RUN_ON_CPU_NULL); } @@ -164,6 +172,7 @@ void tlb_flush_all_cpus(CPUState *src_cpu) void tlb_flush_all_cpus_synced(CPUState *src_cpu) { const run_on_cpu_func fn = tlb_flush_global_async_work; + trace_tlb_flush_synced_schedule(__LINE__, src_cpu->cpu_index, -1); flush_all_helper(src_cpu, fn, RUN_ON_CPU_NULL); async_safe_run_on_cpu(src_cpu, fn, RUN_ON_CPU_NULL); } @@ -174,15 +183,12 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data) unsigned long mmu_idx_bitmask = data.host_int; int mmu_idx; - assert_cpu_is_self(cpu); + trace_tlb_flush_work(__LINE__, cpu->cpu_index, mmu_idx_bitmask); - tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask); + assert_cpu_is_self(cpu); for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { - if (test_bit(mmu_idx, &mmu_idx_bitmask)) { - tlb_debug("%d\n", mmu_idx); - memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0])); memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0])); } @@ -190,25 +196,26 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data) cpu_tb_jmp_cache_clear(cpu); - tlb_debug("done\n"); + trace_tlb_flush_work_complete(__LINE__, cpu->cpu_index); } void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap) { - tlb_debug("mmu_idx: 0x%" PRIx16 "\n", idxmap); - if (!qemu_cpu_is_self(cpu)) { uint16_t pending_flushes = idxmap; pending_flushes &= ~atomic_mb_read(&cpu->pending_tlb_flush); if (pending_flushes) { - tlb_debug("reduced mmu_idx: 0x%" PRIx16 "\n", pending_flushes); + trace_tlb_flush_async_schedule(__LINE__, + current_cpu->cpu_index, + cpu->cpu_index); atomic_or(&cpu->pending_tlb_flush, pending_flushes); async_run_on_cpu(cpu, tlb_flush_by_mmuidx_async_work, RUN_ON_CPU_HOST_INT(pending_flushes)); } } else { + trace_tlb_flush_self(__LINE__, cpu->cpu_index); tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(idxmap)); } @@ -218,7 +225,7 @@ void tlb_flush_by_mmuidx_all_cpus(CPUState *src_cpu, uint16_t idxmap) { const run_on_cpu_func fn = tlb_flush_by_mmuidx_async_work; - tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap); + trace_tlb_flush_async_schedule(__LINE__, src_cpu->cpu_index, -1); flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap)); fn(src_cpu, RUN_ON_CPU_HOST_INT(idxmap)); @@ -229,7 +236,7 @@ void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *src_cpu, { const run_on_cpu_func fn = tlb_flush_by_mmuidx_async_work; - tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap); + trace_tlb_flush_synced_schedule(__LINE__, src_cpu->cpu_index, -1); flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap)); async_safe_run_on_cpu(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap)); @@ -268,14 +275,10 @@ static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data) assert_cpu_is_self(cpu); - tlb_debug("page :" TARGET_FMT_lx "\n", addr); + trace_tlb_flush_work(__LINE__, cpu->cpu_index, addr); /* Check if we need to flush due to large pages. */ if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) { - tlb_debug("forcing full flush (" - TARGET_FMT_lx "/" TARGET_FMT_lx ")\n", - env->tlb_flush_addr, env->tlb_flush_mask); - tlb_flush(cpu); return; } @@ -288,16 +291,19 @@ static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data) } tb_flush_jmp_cache(cpu, addr); + + trace_tlb_flush_work_complete(__LINE__, cpu->cpu_index); } void tlb_flush_page(CPUState *cpu, target_ulong addr) { - tlb_debug("page :" TARGET_FMT_lx "\n", addr); - if (!qemu_cpu_is_self(cpu)) { + trace_tlb_flush_async_schedule(__LINE__, current_cpu->cpu_index, + cpu->cpu_index); async_run_on_cpu(cpu, tlb_flush_page_async_work, RUN_ON_CPU_TARGET_PTR(addr)); } else { + trace_tlb_flush_self(__LINE__, cpu->cpu_index); tlb_flush_page_async_work(cpu, RUN_ON_CPU_TARGET_PTR(addr)); } } @@ -319,8 +325,7 @@ static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu, assert_cpu_is_self(cpu); - tlb_debug("page:%d addr:"TARGET_FMT_lx" mmu_idx:0x%lx\n", - page, addr, mmu_idx_bitmap); + trace_tlb_flush_work(__LINE__, cpu->cpu_index, addr_and_mmuidx); for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { if (test_bit(mmu_idx, &mmu_idx_bitmap)) { @@ -330,6 +335,8 @@ static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu, } tb_flush_jmp_cache(cpu, addr); + + trace_tlb_flush_work_complete(__LINE__, cpu->cpu_index); } static void tlb_check_page_and_flush_by_mmuidx_async_work(CPUState *cpu, @@ -340,35 +347,36 @@ static void tlb_check_page_and_flush_by_mmuidx_async_work(CPUState *cpu, target_ulong addr = addr_and_mmuidx & TARGET_PAGE_MASK; unsigned long mmu_idx_bitmap = addr_and_mmuidx & ALL_MMUIDX_BITS; - tlb_debug("addr:"TARGET_FMT_lx" mmu_idx: %04lx\n", addr, mmu_idx_bitmap); + trace_tlb_flush_work(__LINE__, cpu->cpu_index, addr_and_mmuidx); /* Check if we need to flush due to large pages. */ if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) { - tlb_debug("forced full flush (" - TARGET_FMT_lx "/" TARGET_FMT_lx ")\n", - env->tlb_flush_addr, env->tlb_flush_mask); - + trace_tlb_flush_self(__LINE__, cpu->cpu_index); tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(mmu_idx_bitmap)); } else { + trace_tlb_flush_self(__LINE__, cpu->cpu_index); tlb_flush_page_by_mmuidx_async_work(cpu, data); } + + trace_tlb_flush_work_complete(__LINE__, cpu->cpu_index); } void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t idxmap) { target_ulong addr_and_mmu_idx; - tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%" PRIx16 "\n", addr, idxmap); - /* This should already be page aligned */ addr_and_mmu_idx = addr & TARGET_PAGE_MASK; addr_and_mmu_idx |= idxmap; if (!qemu_cpu_is_self(cpu)) { + trace_tlb_flush_async_schedule(__LINE__, current_cpu->cpu_index, + cpu->cpu_index); async_run_on_cpu(cpu, tlb_check_page_and_flush_by_mmuidx_async_work, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx)); } else { + trace_tlb_flush_self(__LINE__, cpu->cpu_index); tlb_check_page_and_flush_by_mmuidx_async_work( cpu, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx)); } @@ -380,7 +388,7 @@ void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, target_ulong addr, const run_on_cpu_func fn = tlb_check_page_and_flush_by_mmuidx_async_work; target_ulong addr_and_mmu_idx; - tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap); + trace_tlb_flush_async_schedule(__LINE__, src_cpu->cpu_index, -1); /* This should already be page aligned */ addr_and_mmu_idx = addr & TARGET_PAGE_MASK; @@ -397,7 +405,7 @@ void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *src_cpu, const run_on_cpu_func fn = tlb_check_page_and_flush_by_mmuidx_async_work; target_ulong addr_and_mmu_idx; - tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap); + trace_tlb_flush_synced_schedule(__LINE__, src_cpu->cpu_index, -1); /* This should already be page aligned */ addr_and_mmu_idx = addr & TARGET_PAGE_MASK; @@ -411,6 +419,8 @@ void tlb_flush_page_all_cpus(CPUState *src, target_ulong addr) { const run_on_cpu_func fn = tlb_flush_page_async_work; + trace_tlb_flush_async_schedule(__LINE__, src->cpu_index, -1); + flush_all_helper(src, fn, RUN_ON_CPU_TARGET_PTR(addr)); fn(src, RUN_ON_CPU_TARGET_PTR(addr)); } @@ -420,6 +430,8 @@ void tlb_flush_page_all_cpus_synced(CPUState *src, { const run_on_cpu_func fn = tlb_flush_page_async_work; + trace_tlb_flush_synced_schedule(__LINE__, src->cpu_index, -1); + flush_all_helper(src, fn, RUN_ON_CPU_TARGET_PTR(addr)); async_safe_run_on_cpu(src, fn, RUN_ON_CPU_TARGET_PTR(addr)); } diff --git a/accel/tcg/trace-events b/accel/tcg/trace-events index 618cc07738..0d2b1c47ac 100644 --- a/accel/tcg/trace-events +++ b/accel/tcg/trace-events @@ -1,5 +1,21 @@ # Trace events for debugging and performance instrumentation +# cputlb.c - SoftMMU TLB flushes +# +# There a numerous different functions for variations on flushing by +# page/mmuidx and other critera. However we use common events and tag +# them with the appropriate line number in cputlb.c +# +# - setting to=-1 indicates work scheduled on all vCPUs +# +tlb_flush_self(int line, int vcpu) "cputlb.c:%d cpu %d" +tlb_flush_async_schedule(int line, int from, int to) "cputlb.c:%d from_cpu=%d to_cpu=%d" +tlb_flush_synced_schedule(int line, int from, int to) "cputlb.c:%d from_cpu=%d to_cpu=%d" +# The following are the workers, usually running from a async work queue +tlb_flush_all_work(int vcpu) "cpu %d" +tlb_flush_work(int line, int vcpu, unsigned long data) "cputlb.c:%d cpu %d, %lux" +tlb_flush_work_complete(int line, int vcpu) "cputlb.c:%d cpu %d" + # TCG related tracing (you still need -d nochain to get a full picture # as otherwise you'll only see the first TB executed in a chain) # cpu-exec.c