From patchwork Thu Oct 27 15:10:18 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Alex_Benn=C3=A9e?= X-Patchwork-Id: 79745 Delivered-To: patch@linaro.org Received: by 10.140.97.247 with SMTP id m110csp704687qge; Thu, 27 Oct 2016 09:05:16 -0700 (PDT) X-Received: by 10.31.76.134 with SMTP id z128mr7075560vka.59.1477584316848; Thu, 27 Oct 2016 09:05:16 -0700 (PDT) Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id 6si3635955uax.181.2016.10.27.09.05.16 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 27 Oct 2016 09:05:16 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:42516 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bznB6-0006bU-4M for patch@linaro.org; Thu, 27 Oct 2016 12:05:16 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45526) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bzmTN-0003iX-Ka for qemu-devel@nongnu.org; Thu, 27 Oct 2016 11:20:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bzmTL-0004U1-5c for qemu-devel@nongnu.org; Thu, 27 Oct 2016 11:20:05 -0400 Received: from mail-wm0-x22e.google.com ([2a00:1450:400c:c09::22e]:36526) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1bzmTK-0004SZ-Pv for qemu-devel@nongnu.org; Thu, 27 Oct 2016 11:20:03 -0400 Received: by mail-wm0-x22e.google.com with SMTP id b80so40534059wme.1 for ; Thu, 27 Oct 2016 08:20:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ANkDkZJFGrDDfk35XwpT2crmTxzHLErrXwWb39Mo0g8=; b=OWMV3ifCgQA/8gH8jrnQz05aImbX/tpvodaA+u53q3tPEGxbq1X7ypw4yB1JODhsol Qs8GWcpuSfJKmPeh+AwPKj13NWqnZr/zbYBkZbQCtMIYWIHdfPb9pP7liRBt9c00/BLj B2jSoC0TVNvwXUCkT6sRewTeWo36UmfLqYrFs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ANkDkZJFGrDDfk35XwpT2crmTxzHLErrXwWb39Mo0g8=; b=kIyQcDWNQAUmwwqCJYO32i2Z8Dn5KB1La9GNLug2BA0CTcmimNrMCYNtSEuiK+0MYa jxJSGX7TL5TPe6gAoIQ1jSkTKdLQjQZZXFB7zJlsuyRCGInu8rf8Lc4Iq+1FddYq7Lbi DAebAj3BwHtEwcKPUuaQCpDBVwDkg8VSTVK+VjB/wuxe/w71BMOTfHZ8Ss+a8LQVRmfj egv2k2r5ENrArhsb+VnrH0gDtTVz9bf8RHi7LcLW/iZLAyxsBTGOu/zihY18ln/OXaKr Hr/M28zPBHENOiQuxDYBKwb/4VVSulqXZEq0nOPVeGUd+gM4476DV98lI9GMOIt/yVoH RCLQ== X-Gm-Message-State: ABUngvdtfFUu2lA6lUygiMIgHFPDqiphfD1ucDpEACm+Olpp80zmolcTHnkuJ3knFAY+Qc3M X-Received: by 10.194.147.239 with SMTP id tn15mr1545134wjb.125.1477581601569; Thu, 27 Oct 2016 08:20:01 -0700 (PDT) Received: from zen.linaro.local ([81.128.185.34]) by smtp.gmail.com with ESMTPSA id s133sm3797100wmd.19.2016.10.27.08.19.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Oct 2016 08:19:58 -0700 (PDT) Received: from zen.linaroharston (localhost [127.0.0.1]) by zen.linaro.local (Postfix) with ESMTP id B7A7D3E0198; Thu, 27 Oct 2016 16:11:00 +0100 (BST) From: =?UTF-8?q?Alex=20Benn=C3=A9e?= To: pbonzini@redhat.com Date: Thu, 27 Oct 2016 16:10:18 +0100 Message-Id: <20161027151030.20863-22-alex.bennee@linaro.org> X-Mailer: git-send-email 2.10.1 In-Reply-To: <20161027151030.20863-1-alex.bennee@linaro.org> References: <20161027151030.20863-1-alex.bennee@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a00:1450:400c:c09::22e Subject: [Qemu-devel] [PATCH v5 21/33] tcg: enable thread-per-vCPU X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mttcg@listserver.greensocs.com, peter.maydell@linaro.org, claudio.fontana@huawei.com, nikunj@linux.vnet.ibm.com, Peter Crosthwaite , jan.kiszka@siemens.com, mark.burton@greensocs.com, a.rigo@virtualopensystems.com, qemu-devel@nongnu.org, cota@braap.org, serge.fdrv@gmail.com, bobby.prani@gmail.com, rth@twiddle.net, =?UTF-8?q?Alex=20Benn=C3=A9e?= , fred.konrad@greensocs.com Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" There are a couple of changes that occur at the same time here: - introduce a single vCPU qemu_tcg_cpu_thread_fn One of these is spawned per vCPU with its own Thread and Condition variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old single threaded function. - the TLS current_cpu variable is now live for the lifetime of MTTCG vCPU threads. This is for future work where async jobs need to know the vCPU context they are operating in. The user to switch on multi-thread behaviour and spawn a thread per-vCPU. For a simple test kvm-unit-test like: ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi Will now use 4 vCPU threads and have an expected FAIL (instead of the unexpected PASS) as the default mode of the test has no protection when incrementing a shared variable. We enable the parallel_cpus flag to ensure we generate correct barrier and atomic code if supported by the front and backends. As each back end and front end is updated they can add CONFIG_MTTCG_TARGET and CONFIG_MTTCG_HOST to their respective make configurations so default_mttcg_enabled does the right thing. Signed-off-by: KONRAD Frederic Signed-off-by: Paolo Bonzini [AJB: Some fixes, conditionally, commit rewording] Signed-off-by: Alex Bennée --- v1 (ajb): - fix merge conflicts - maintain single-thread approach v2 - re-base fixes (no longer has tb_find_fast lock tweak ahead) - remove bogus break condition on cpu->stop/stopped - only process exiting cpus exit_request - handle all cpus idle case (fixes shutdown issues) - sleep on EXCP_HALTED in mttcg mode (prevent crash on start-up) - move icount timer into helper v3 - update the commit message - rm kick_timer tweaks (move to earlier tcg_current_cpu tweaks) - ensure linux-user clears cpu->exit_request in loop - purging of global exit_request and tcg_current_cpu in earlier patches - fix checkpatch warnings v4 - don't break loop on stopped, we may never schedule next in RR mode - make sure we flush iorequests of current cpu if we exited on one - add tcg_cpu_exec_start/end wraps for async work functions - stop killing of current_cpu on loop exit - set current_cpu in the single thread function - remove sleep special case, add qemu_tcg_should_sleep() for mttcg - no need to atomic set cpu->exit_request going into the loop - removed extraneous setting of exit_request - split tb_lock() part of patch - rename single thread fn to qemu_tcg_rr_cpu_thread_fn v5 - enable parallel_cpus for MTTCG (for barriers/atomics) - expand on CONFIG_ flags in commit message --- cpu-exec.c | 5 --- cpus.c | 135 +++++++++++++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 104 insertions(+), 36 deletions(-) -- 2.10.1 diff --git a/cpu-exec.c b/cpu-exec.c index 3458dd6..1887a3d 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -396,7 +396,6 @@ static inline bool cpu_handle_halt(CPUState *cpu) } #endif if (!cpu_has_work(cpu)) { - current_cpu = NULL; return true; } @@ -540,7 +539,6 @@ static inline void cpu_handle_interrupt(CPUState *cpu, if (unlikely(atomic_read(&cpu->exit_request) || replay_has_interrupt())) { - atomic_set(&cpu->exit_request, 0); cpu->exception_index = EXCP_INTERRUPT; cpu_loop_exit(cpu); } @@ -675,8 +673,5 @@ int cpu_exec(CPUState *cpu) cc->cpu_exec_exit(cpu); rcu_read_unlock(); - /* fail safe : never use current_cpu outside cpu_exec() */ - current_cpu = NULL; - return ret; } diff --git a/cpus.c b/cpus.c index bba71af..0c046e3 100644 --- a/cpus.c +++ b/cpus.c @@ -44,6 +44,7 @@ #include "qemu/main-loop.h" #include "qemu/bitmap.h" #include "qemu/seqlock.h" +#include "tcg.h" #include "qapi-event.h" #include "hw/nmi.h" #include "sysemu/replay.h" @@ -776,7 +777,7 @@ static void kick_tcg_thread(void *opaque) static void start_tcg_kick_timer(void) { - if (!tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) { + if (!mttcg_enabled && !tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) { tcg_kick_vcpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, kick_tcg_thread, NULL); timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick()); } @@ -1029,27 +1030,34 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu) static void qemu_wait_io_event_common(CPUState *cpu) { + atomic_mb_set(&cpu->thread_kicked, false); if (cpu->stop) { cpu->stop = false; cpu->stopped = true; qemu_cond_broadcast(&qemu_pause_cond); } process_queued_cpu_work(cpu); - cpu->thread_kicked = false; +} + +static bool qemu_tcg_should_sleep(CPUState *cpu) +{ + if (mttcg_enabled) { + return cpu_thread_is_idle(cpu); + } else { + return all_cpu_threads_idle(); + } } static void qemu_tcg_wait_io_event(CPUState *cpu) { - while (all_cpu_threads_idle()) { + while (qemu_tcg_should_sleep(cpu)) { stop_tcg_kick_timer(); qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex); } start_tcg_kick_timer(); - CPU_FOREACH(cpu) { - qemu_wait_io_event_common(cpu); - } + qemu_wait_io_event_common(cpu); } static void qemu_kvm_wait_io_event(CPUState *cpu) @@ -1120,6 +1128,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg) qemu_thread_get_self(cpu->thread); cpu->thread_id = qemu_get_thread_id(); cpu->can_do_io = 1; + current_cpu = cpu; sigemptyset(&waitset); sigaddset(&waitset, SIG_IPI); @@ -1128,9 +1137,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg) cpu->created = true; qemu_cond_signal(&qemu_cpu_cond); - current_cpu = cpu; while (1) { - current_cpu = NULL; qemu_mutex_unlock_iothread(); do { int sig; @@ -1141,7 +1148,6 @@ static void *qemu_dummy_cpu_thread_fn(void *arg) exit(1); } qemu_mutex_lock_iothread(); - current_cpu = cpu; qemu_wait_io_event_common(cpu); } @@ -1253,7 +1259,7 @@ static void deal_with_unplugged_cpus(void) * elsewhere. */ -static void *qemu_tcg_cpu_thread_fn(void *arg) +static void *qemu_tcg_rr_cpu_thread_fn(void *arg) { CPUState *cpu = arg; @@ -1275,6 +1281,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) /* process any pending work */ CPU_FOREACH(cpu) { + current_cpu = cpu; qemu_wait_io_event_common(cpu); } } @@ -1296,6 +1303,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) while (cpu && !cpu->exit_request) { atomic_mb_set(&tcg_current_rr_cpu, cpu); + current_cpu = cpu; qemu_clock_enable(QEMU_CLOCK_VIRTUAL, (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0); @@ -1307,7 +1315,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) cpu_handle_guest_debug(cpu); break; } - } else if (cpu->stop || cpu->stopped) { + } else if (cpu->stop) { if (cpu->unplug) { cpu = CPU_NEXT(cpu); } @@ -1326,13 +1334,71 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) handle_icount_deadline(); - qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus)); + qemu_tcg_wait_io_event(cpu ? cpu : QTAILQ_FIRST(&cpus)); deal_with_unplugged_cpus(); } return NULL; } +/* Multi-threaded TCG + * + * In the multi-threaded case each vCPU has its own thread. The TLS + * variable current_cpu can be used deep in the code to find the + * current CPUState for a given thread. + */ + +static void *qemu_tcg_cpu_thread_fn(void *arg) +{ + CPUState *cpu = arg; + + rcu_register_thread(); + + qemu_mutex_lock_iothread(); + qemu_thread_get_self(cpu->thread); + + cpu->thread_id = qemu_get_thread_id(); + cpu->created = true; + cpu->can_do_io = 1; + current_cpu = cpu; + qemu_cond_signal(&qemu_cpu_cond); + + /* process any pending work */ + cpu->exit_request = 1; + + while (1) { + if (cpu_can_run(cpu)) { + int r; + r = tcg_cpu_exec(cpu); + switch (r) { + case EXCP_DEBUG: + cpu_handle_guest_debug(cpu); + break; + case EXCP_HALTED: + /* during start-up the vCPU is reset and the thread is + * kicked several times. If we don't ensure we go back + * to sleep in the halted state we won't cleanly + * start-up when the vCPU is enabled. + * + * cpu->halted should ensure we sleep in wait_io_event + */ + g_assert(cpu->halted); + break; + default: + /* Ignore everything else? */ + break; + } + } + + handle_icount_deadline(); + + atomic_mb_set(&cpu->exit_request, 0); + qemu_tcg_wait_io_event(cpu); + } + + return NULL; +} + static void qemu_cpu_kick_thread(CPUState *cpu) { #ifndef _WIN32 @@ -1357,7 +1423,7 @@ void qemu_cpu_kick(CPUState *cpu) qemu_cond_broadcast(cpu->halt_cond); if (tcg_enabled()) { cpu_exit(cpu); - /* Also ensure current RR cpu is kicked */ + /* NOP unless doing single-thread RR */ qemu_cpu_kick_rr_cpu(); } else { qemu_cpu_kick_thread(cpu); @@ -1426,13 +1492,6 @@ void pause_all_vcpus(void) if (qemu_in_vcpu_thread()) { cpu_stop_current(); - if (!kvm_enabled()) { - CPU_FOREACH(cpu) { - cpu->stop = false; - cpu->stopped = true; - } - return; - } } while (!all_vcpus_paused()) { @@ -1481,29 +1540,43 @@ void cpu_remove_sync(CPUState *cpu) static void qemu_tcg_init_vcpu(CPUState *cpu) { char thread_name[VCPU_THREAD_NAME_SIZE]; - static QemuCond *tcg_halt_cond; - static QemuThread *tcg_cpu_thread; + static QemuCond *single_tcg_halt_cond; + static QemuThread *single_tcg_cpu_thread; - /* share a single thread for all cpus with TCG */ - if (!tcg_cpu_thread) { + if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) { + parallel_cpus = true; cpu->thread = g_malloc0(sizeof(QemuThread)); cpu->halt_cond = g_malloc0(sizeof(QemuCond)); qemu_cond_init(cpu->halt_cond); - tcg_halt_cond = cpu->halt_cond; - snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", + + if (qemu_tcg_mttcg_enabled()) { + /* create a thread per vCPU with TCG (MTTCG) */ + snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", cpu->cpu_index); - qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn, - cpu, QEMU_THREAD_JOINABLE); + + qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn, + cpu, QEMU_THREAD_JOINABLE); + + } else { + /* share a single thread for all cpus with TCG */ + snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG"); + qemu_thread_create(cpu->thread, thread_name, + qemu_tcg_rr_cpu_thread_fn, + cpu, QEMU_THREAD_JOINABLE); + + single_tcg_halt_cond = cpu->halt_cond; + single_tcg_cpu_thread = cpu->thread; + } #ifdef _WIN32 cpu->hThread = qemu_thread_get_handle(cpu->thread); #endif while (!cpu->created) { qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex); } - tcg_cpu_thread = cpu->thread; } else { - cpu->thread = tcg_cpu_thread; - cpu->halt_cond = tcg_halt_cond; + /* For non-MTTCG cases we share the thread */ + cpu->thread = single_tcg_cpu_thread; + cpu->halt_cond = single_tcg_halt_cond; } }