From patchwork Thu May 18 13:57:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Fei Wu X-Patchwork-Id: 683382 Delivered-To: patch@linaro.org Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp388558wrt; Thu, 18 May 2023 06:58:53 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7goX8BpNm/mtQucaY7I+MZv9ckFBw3AVzjzf9HvmvWk/OsLEWFaF1QiCoGR1Z+ZyNSUpXC X-Received: by 2002:ac8:5f07:0:b0:3e9:243a:c35b with SMTP id x7-20020ac85f07000000b003e9243ac35bmr5750751qta.51.1684418333358; Thu, 18 May 2023 06:58:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684418333; cv=none; d=google.com; s=arc-20160816; b=Ijr9LfUI8EDWTfXaVrean7btqAkCVC2FmPBZw3Sj6atIQdlpq9tpzMZVjiEC3U8UgG UXZxB0aCDv5+zR9hc2MTzyPQqv8j6V7UNR4YAi4NY5FSslJbfWIpCq+MnZlg17XT3ilk oEgI1gzRPqAm1r/SZd3gCgkc3lQN0gDXM+RB/rWktDvbFbmKCSVP/WkkRLkXa0TKPcCq PeZEvEw6vorr9/dIy9uirWj0Cg6VHqf4mlJT3gELRya+iCOszKza3/Mjk6pkX0Y1QKt8 KdP/fBzDbdrNxWMZh89PwGJMjQlaiVfRH8hbAC7g/XorFo13aja5xQbA8MlUy+Z3oVlV uEkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=xaMP5qLm8S6jNYW72zZDb770WDxihRYvSaOXKcllezE=; b=CMzmxMuN5S85/FA4Y18OnRzjfGWl8yG9CH+imQCv5z4jp00oZiaaFllGIrvEE9BHU4 axAvSo3IvHROxSz1ZRtqTJ0lXK328s1ONw1Av7H70e+C5HtNFUT5IpWsopmDeK+JF3iA veD/iJKvWKL6w9yvSyVb8du4HYnYGW5RQ99/prjf9UrKk7135ILeV4xlGKLCxAznFcab WGNi9LoyU+8eOx8O0fDtWG+lN2Hfew/o5MsxwA/Pj9FWPhqFNGRfFq21Ib7ml5E8KrXW RoVhoS8JtqA7LFiTUFwPf5xRb8jv1rWUyy0Ew+XHz5xz0lvzK+05fKZL3ZFcpOpxIU1I ayUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HpbUeRec; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id u20-20020ac858d4000000b003f1a0ec00b1si1027069qta.220.2023.05.18.06.58.53 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 18 May 2023 06:58:53 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HpbUeRec; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pze81-0003ej-1b; Thu, 18 May 2023 09:57:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pze7r-0003Lc-PZ for qemu-devel@nongnu.org; Thu, 18 May 2023 09:57:04 -0400 Received: from mga12.intel.com ([192.55.52.136]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pze7i-0002Nb-0b for qemu-devel@nongnu.org; Thu, 18 May 2023 09:57:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684418213; x=1715954213; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/gZ1PedlpAhomQcxVZxkJ2uY5o2S3xnuNgEuP+lHrrE=; b=HpbUeReckes9cQ+aK4i4xIXmcybHJXnCH6Eaiqtw4tIpf4e+HTqGGyuw JnnCOQXtBe/O7jalyngpEE2nilCziZ0wxAwTFJEGSTuaPZSLukv/Txow+ 5hA8QHpKIIRb3hlkwMWvQxxoALwTN0lREKkJDaCUFskzWSf9WM3++r30q 4t3i3VQMw37eC3/cYNE4Tl1wwe0rWLd+hQKAjA7McIf/PzETy5QO8oOHL q6us8+cGQ4O89kMRDW316GyZsKTGSjU1zvMUOAB3MvcP7UIJ8CJGttYOT vyeBsTCaWXcw8TJx9OYl54Gu4D0Fx1pj3aXUicBi2HgnaWS4zhmka6Dmk Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="331685752" X-IronPort-AV: E=Sophos;i="5.99,285,1677571200"; d="scan'208";a="331685752" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 06:56:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="876428831" X-IronPort-AV: E=Sophos;i="5.99,285,1677571200"; d="scan'208";a="876428831" Received: from wufei-optiplex-7090.sh.intel.com ([10.238.200.247]) by orsmga005.jf.intel.com with ESMTP; 18 May 2023 06:56:50 -0700 From: Fei Wu To: qemu-devel@nongnu.org, richard.henderson@linaro.org, alex.bennee@linaro.org, fei2.wu@intel.com Cc: Paolo Bonzini Subject: [PATCH v12 07/15] accel/tcg: convert profiling of code generation to TBStats Date: Thu, 18 May 2023 21:57:49 +0800 Message-Id: <20230518135757.1442654-8-fei2.wu@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230518135757.1442654-1-fei2.wu@intel.com> References: <20230518135757.1442654-1-fei2.wu@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.55.52.136; envelope-from=fei2.wu@intel.com; helo=mga12.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_HELO_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alex Bennée We continue the conversion of CONFIG_PROFILER data to TBStats by measuring the time it takes to generate code. Instead of calculating elapsed time as we do we simply store key timestamps in the profiler structure and then calculate the totals and add them to TBStats under lock. Signed-off-by: Alex Bennée Signed-off-by: Fei Wu --- accel/tcg/tb-stats.c | 33 ------------------- accel/tcg/translate-all.c | 69 ++++++++++++++++++++++----------------- include/exec/tb-stats.h | 7 ++++ include/tcg/tcg.h | 14 ++++---- tcg/tcg.c | 17 ++++------ 5 files changed, 59 insertions(+), 81 deletions(-) diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c index 7deb617446..c40c9a748e 100644 --- a/accel/tcg/tb-stats.c +++ b/accel/tcg/tb-stats.c @@ -88,39 +88,6 @@ void dump_jit_profile_info(TCGProfile *s, GString *buf) jpi->host / (double) jpi->translations); g_string_append_printf(buf, "avg search data/TB %0.1f\n", jpi->search_data / (double) jpi->translations); - - if (s) { - int64_t tot = s->interm_time + s->code_time; - g_string_append_printf(buf, "JIT cycles %" PRId64 - " (%0.3f s at 2.4 GHz)\n", - tot, tot / 2.4e9); - g_string_append_printf(buf, "cycles/op %0.1f\n", - jpi->ops ? (double)tot / jpi->ops : 0); - g_string_append_printf(buf, "cycles/in byte %0.1f\n", - jpi->guest ? (double)tot / jpi->guest : 0); - g_string_append_printf(buf, "cycles/out byte %0.1f\n", - jpi->host ? (double)tot / jpi->host : 0); - g_string_append_printf(buf, "cycles/search byte %0.1f\n", - jpi->search_data ? (double)tot / jpi->search_data : 0); - if (tot == 0) { - tot = 1; - } - g_string_append_printf(buf, " gen_interm time %0.1f%%\n", - (double)s->interm_time / tot * 100.0); - g_string_append_printf(buf, " gen_code time %0.1f%%\n", - (double)s->code_time / tot * 100.0); - g_string_append_printf(buf, "optim./code time %0.1f%%\n", - (double)s->opt_time / (s->code_time ? s->code_time : 1) - * 100.0); - g_string_append_printf(buf, "liveness/code time %0.1f%%\n", - (double)s->la_time / (s->code_time ? s->code_time : 1) - * 100.0); - g_string_append_printf(buf, "cpu_restore count %" PRId64 "\n", - s->restore_count); - g_string_append_printf(buf, " avg cycles %0.1f\n", - s->restore_count ? - (double)s->restore_time / s->restore_count : 0); - } } g_free(jpi); } diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index beaef03902..ea2b648ffd 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -282,8 +282,9 @@ void page_init(void) */ static int setjmp_gen_code(CPUArchState *env, TranslationBlock *tb, target_ulong pc, void *host_pc, - int *max_insns, int64_t *ti) + int *max_insns) { + TCGProfile *prof = &tcg_ctx->prof; int ret = sigsetjmp(tcg_ctx->jmp_trans, 0); if (unlikely(ret != 0)) { return ret; @@ -297,11 +298,9 @@ static int setjmp_gen_code(CPUArchState *env, TranslationBlock *tb, tcg_ctx->cpu = NULL; *max_insns = tb->icount; -#ifdef CONFIG_PROFILER - qatomic_set(&tcg_ctx->prof.interm_time, - tcg_ctx->prof.interm_time + profile_getclock() - *ti); - *ti = profile_getclock(); -#endif + if (tb_stats_enabled(tb, TB_JIT_TIME)) { + prof->gen_ir_done_time = profile_getclock(); + } return tcg_gen_code(tcg_ctx, tb, pc); } @@ -352,7 +351,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tcg_insn_unit *gen_code_buf; int gen_code_size, search_size, max_insns; TCGProfile *prof = &tcg_ctx->prof; - int64_t ti; void *host_pc; assert_memory_lock(); @@ -403,10 +401,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tb_overflow: -#ifdef CONFIG_PROFILER - ti = profile_getclock(); -#endif - trace_translate_block(tb, pc, tb->tc.ptr); /* @@ -418,11 +412,14 @@ TranslationBlock *tb_gen_code(CPUState *cpu, if (tb_stats_collection_enabled() && qemu_log_in_addr_range(tb->pc)) { tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags); + if (tb_stats_enabled(tb, TB_JIT_TIME)) { + prof->gen_start_time = profile_getclock(); + } } else { tb->tb_stats = NULL; } - gen_code_size = setjmp_gen_code(env, tb, pc, host_pc, &max_insns, &ti); + gen_code_size = setjmp_gen_code(env, tb, pc, host_pc, &max_insns); if (unlikely(gen_code_size < 0)) { switch (gen_code_size) { case -1: @@ -474,9 +471,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu, */ perf_report_code(pc, tb, tcg_splitwx_to_rx(gen_code_buf)); -#ifdef CONFIG_PROFILER - qatomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti); -#endif + if (tb_stats_enabled(tb, TB_JIT_TIME)) { + prof->gen_code_done_time = profile_getclock(); + } #ifdef DEBUG_DISAS if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) && @@ -586,26 +583,38 @@ TranslationBlock *tb_gen_code(CPUState *cpu, * Collect JIT stats when enabled. We batch them all up here to * avoid spamming the cache with atomic accesses */ - if (tb_stats_enabled(tb, TB_JIT_STATS)) { + if (tb_stats_enabled(tb, (TB_JIT_STATS | TB_JIT_TIME))) { TBStatistics *ts = tb->tb_stats; qemu_mutex_lock(&ts->jit_stats_lock); - ts->code.num_guest_inst += prof->translation.nb_guest_insns; - ts->code.num_tcg_ops += prof->translation.nb_ops_pre_opt; - ts->code.num_tcg_ops_opt += tcg_ctx->nb_ops; - ts->code.spills += prof->translation.nb_spills; - ts->code.temps += prof->translation.temp_count; - ts->code.deleted_ops += prof->translation.del_op_count; - ts->code.in_len += tb->size; - ts->code.out_len += tb->tc.size; - ts->code.search_out_len += search_size; - - ts->translations.total++; - if (tb_page_addr1(tb) != -1) { - ts->translations.spanning++; + if (tb_stats_enabled(tb, TB_JIT_STATS)) { + ts->code.num_guest_inst += prof->translation.nb_guest_insns; + ts->code.num_tcg_ops += prof->translation.nb_ops_pre_opt; + ts->code.num_tcg_ops_opt += tcg_ctx->nb_ops; + ts->code.spills += prof->translation.nb_spills; + ts->code.temps += prof->translation.temp_count; + ts->code.deleted_ops += prof->translation.del_op_count; + ts->code.in_len += tb->size; + ts->code.out_len += tb->tc.size; + ts->code.search_out_len += search_size; + + ts->translations.total++; + if (tb_page_addr1(tb) != -1) { + ts->translations.spanning++; + } + + g_ptr_array_add(ts->tbs, tb); } - g_ptr_array_add(ts->tbs, tb); + if (tb_stats_enabled(tb, TB_JIT_TIME)) { + ts->gen_times.ir += prof->gen_ir_done_time - prof->gen_start_time; + ts->gen_times.ir_opt += + prof->gen_opt_done_time - prof->gen_ir_done_time; + ts->gen_times.la += + prof->gen_la_done_time - prof->gen_opt_done_time; + ts->gen_times.code += + prof->gen_code_done_time - prof->gen_la_done_time; + } qemu_mutex_unlock(&ts->jit_stats_lock); } diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h index cc9ab686b8..2543367c70 100644 --- a/include/exec/tb-stats.h +++ b/include/exec/tb-stats.h @@ -97,6 +97,12 @@ struct TBStatistics { uint64_t tb_restore_time; uint64_t tb_restore_count; + struct { + uint64_t ir; + uint64_t ir_opt; + uint64_t la; + uint64_t code; + } gen_times; }; bool tb_stats_cmp(const void *ap, const void *bp); @@ -104,5 +110,6 @@ bool tb_stats_cmp(const void *ap, const void *bp); void init_tb_stats_htable(void); void dump_jit_profile_info(TCGProfile *s, GString *buf); +void dump_jit_exec_time_info(uint64_t dev_time); #endif diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h index 69df21ce4c..9b74c5acce 100644 --- a/include/tcg/tcg.h +++ b/include/tcg/tcg.h @@ -557,13 +557,13 @@ typedef struct TCGProfile { int64_t code_in_len; int64_t code_out_len; int64_t search_out_len; - int64_t interm_time; - int64_t code_time; - int64_t la_time; - int64_t opt_time; - int64_t restore_count; - int64_t restore_time; - int64_t table_op_count[NB_OPS]; + + /* Timestamps during translation */ + uint64_t gen_start_time; + uint64_t gen_ir_done_time; + uint64_t gen_opt_done_time; + uint64_t gen_la_done_time; + uint64_t gen_code_done_time; } TCGProfile; struct TCGContext { diff --git a/tcg/tcg.c b/tcg/tcg.c index a3a42ef387..9e657719fa 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -6005,18 +6005,13 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start) } #endif -#ifdef CONFIG_PROFILER - qatomic_set(&prof->opt_time, prof->opt_time - profile_getclock()); -#endif - #ifdef USE_TCG_OPTIMIZATIONS tcg_optimize(s); #endif -#ifdef CONFIG_PROFILER - qatomic_set(&prof->opt_time, prof->opt_time + profile_getclock()); - qatomic_set(&prof->la_time, prof->la_time - profile_getclock()); -#endif + if (tb_stats_enabled(tb, TB_JIT_TIME)) { + prof->gen_opt_done_time = profile_getclock(); + } reachable_code_pass(s); liveness_pass_0(s); @@ -6042,9 +6037,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start) } } -#ifdef CONFIG_PROFILER - qatomic_set(&prof->la_time, prof->la_time + profile_getclock()); -#endif + if (tb_stats_enabled(tb, TB_JIT_TIME)) { + prof->gen_la_done_time = profile_getclock(); + } #ifdef DEBUG_DISAS if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_OPT)