From patchwork Thu Mar 11 00:21:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 397378 Delivered-To: patch@linaro.org Received: by 2002:a17:906:a383:0:0:0:0 with SMTP id k3csp989ejz; Wed, 10 Mar 2021 16:27:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJymjMljxfpW95Sanwy8LeHcjqPtrz/v8OclqTgvdYyipIsA2aEdjGGmdiNOky1R3eCQTIrv X-Received: by 2002:a5d:8887:: with SMTP id d7mr4242740ioo.151.1615422434535; Wed, 10 Mar 2021 16:27:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615422434; cv=none; d=google.com; s=arc-20160816; b=C+rAXTm9MMJdMmdhKp/AzKiT+4i55BKAmW1C+IM6Y1RQoPRosqzEoMzbu/fHx4rIz0 QZYDU7CfhJZszqAXRvddIhPOv5tpXXKJpG61li10Nu1VUsur28sE1HF2dpYhFvckYatw gEMMaebY85sjvBts7nHgLyzgY8DSlnalhPEVoomVVeg/lC0TFxeCQfzPsfw3K0aPoegz I3h3/d+mgsahZdniE1S+JbIx3HC5Q0otmi62bg+Lz9TwVVqwx3HflzoHNzYR1zp3GAF1 vFABZOtULJBBMHNGCUq9wc3/W/S0KVtoEI4g1gRkNc3dvSp37CQLNLqYOrgfItQfHl47 rYTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=1kd3z5f3F5IgA4tmLR2z9IlG5FeR5TPlOdPkW3kV5W4=; b=L0LOG5/C8bLaq/9zQswjZ4TnRGr6Kakty98uSCGQSS0UV9BtSScciMVhB9Yzo5CI62 zrHoOFb8fCROW1gyOkxYaYFNXO61c0loFDES24a+2FfZV89XMHY6XDNXu8Ug+rOg64qs FpgqywHdb03lC2YeRK+MQqAiFh3cunQ4rQnQFu86T80uAVWOjt/CaGgz7QDhq2E0HKkD bGl2zcaEPO3VKlImYjO1bTkhvFiMuzJiSafSthu1rZLYLd4ZCgNrzEHYzON4RLvv8Rrd R5rSastltHrFQnjkIUMjRFBqpA7ND8RZMh1QhKBzq5IADMcdz8dHvoKzaLbF9XdU7ohY 46Eg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=DQ3tA3pU; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id v15si1117444jal.37.2021.03.10.16.27.14 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 10 Mar 2021 16:27:14 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=DQ3tA3pU; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:40020 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lK9AX-0003CJ-Op for patch@linaro.org; Wed, 10 Mar 2021 19:27:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:55136) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lK95v-0004su-SI for qemu-devel@nongnu.org; Wed, 10 Mar 2021 19:22:27 -0500 Received: from mail-ot1-x330.google.com ([2607:f8b0:4864:20::330]:38136) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lK95h-0001LT-FQ for qemu-devel@nongnu.org; Wed, 10 Mar 2021 19:22:27 -0500 Received: by mail-ot1-x330.google.com with SMTP id a17so18335801oto.5 for ; Wed, 10 Mar 2021 16:22:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1kd3z5f3F5IgA4tmLR2z9IlG5FeR5TPlOdPkW3kV5W4=; b=DQ3tA3pUfIkyZiLVMWBICsgUQQOMQ4mbNfBTwLZJ3oUrzDMTVFolzsFCvoYg2NuF19 Jz5ZVvXfiEa0DMdVrAf6eUq4MqZVEiNuGpLJ+VtlxFgoH48ghfiJgk1rDeLltAGnDRNB jOdVeKinwNIunKmvIVEKYC/MkEGiJibibpL4V6BgPXoyyc3Mb55COs9ojkFuW0lonpwn P3DUG//tLwbNiC72dNEg+iFaGQhraZWl5Vz8vMim8ozBNlOF7mXM5Ovfd6BGZI4UF3/D dlUz8nAKMLSHYwXh14YESxHBbUz/CgNw+1ciwW3aNc0Vlp7miCbDT3iFhA2AXgaWq6GJ WnGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1kd3z5f3F5IgA4tmLR2z9IlG5FeR5TPlOdPkW3kV5W4=; b=jYidq213TQPc3mhw7Ly9oODBT4EzUg9pJQcrSGMzcZilawXadQJ87hfSgqRzDXK8ar t/82dnIxzaKDh6UyW+bh2aeqr5c8dPAxspAQKLkXO60VLmiT0S64o7yvxo0dUJryw343 UPQPtaSfkuBtz/jLq3xtgJYcjVkTZaWa1D5upk4J4H/Lf5kjk43t/zD7WzL0gq4OBAwY 9amcydR6UDgEPqz4hFCMq6CNbFs9h/YkWUlbXUDaqg3Eld2V13nNVE3YGsMZ+1iB3WMu 18UxHtjdYYXqUzoCCFkHm2Rqe7nNbNmsOunrxjgOdoyxN0PV4+mRiF4SpDEOx/het+qr vxwA== X-Gm-Message-State: AOAM533B70pBRF0PwHfXtPTbys7jB8x9Vm+ZJH0MaW52NPcGNedFZbjs Thwl4Lmdj505U3n08HilKNMo9GNlGtOdMq2X X-Received: by 2002:a9d:68ce:: with SMTP id i14mr4506381oto.151.1615422132046; Wed, 10 Mar 2021 16:22:12 -0800 (PST) Received: from localhost.localdomain (fixed-187-189-51-144.totalplay.net. [187.189.51.144]) by smtp.gmail.com with ESMTPSA id a6sm300962otq.79.2021.03.10.16.22.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 16:22:11 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH 10/26] accel/tcg: Move alloc_code_gen_buffer to tcg/region.c Date: Wed, 10 Mar 2021 18:21:40 -0600 Message-Id: <20210311002156.253711-11-richard.henderson@linaro.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210311002156.253711-1-richard.henderson@linaro.org> References: <20210311002156.253711-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::330; envelope-from=richard.henderson@linaro.org; helo=mail-ot1-x330.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: r.bolshakov@yadro.com, j@getutm.app Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Buffer management is integral to tcg. Do not leave the allocation to code outside of tcg/. This is code movement, with further cleanups to follow. Signed-off-by: Richard Henderson --- include/tcg/tcg.h | 2 +- accel/tcg/translate-all.c | 414 +------------------------------------ tcg/region.c | 421 +++++++++++++++++++++++++++++++++++++- 3 files changed, 418 insertions(+), 419 deletions(-) -- 2.25.1 diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h index 0f0695e90d..7a435bf807 100644 --- a/include/tcg/tcg.h +++ b/include/tcg/tcg.h @@ -874,7 +874,7 @@ void *tcg_malloc_internal(TCGContext *s, int size); void tcg_pool_reset(TCGContext *s); TranslationBlock *tcg_tb_alloc(TCGContext *s); -void tcg_region_init(void); +void tcg_region_init(size_t tb_size, int splitwx); void tb_destroy(TranslationBlock *tb); void tcg_region_reset_all(void); diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 6d3184e7da..4071edda16 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -18,7 +18,6 @@ */ #include "qemu/osdep.h" -#include "qemu/units.h" #include "qemu-common.h" #define NO_CPU_IO_DEFS @@ -51,7 +50,6 @@ #include "exec/tb-hash.h" #include "exec/translate-all.h" #include "qemu/bitmap.h" -#include "qemu/error-report.h" #include "qemu/qemu-print.h" #include "qemu/timer.h" #include "qemu/main-loop.h" @@ -895,408 +893,6 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1, } } -/* Minimum size of the code gen buffer. This number is randomly chosen, - but not so small that we can't have a fair number of TB's live. */ -#define MIN_CODE_GEN_BUFFER_SIZE (1 * MiB) - -/* Maximum size of the code gen buffer we'd like to use. Unless otherwise - indicated, this is constrained by the range of direct branches on the - host cpu, as used by the TCG implementation of goto_tb. */ -#if defined(__x86_64__) -# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB) -#elif defined(__sparc__) -# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB) -#elif defined(__powerpc64__) -# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB) -#elif defined(__powerpc__) -# define MAX_CODE_GEN_BUFFER_SIZE (32 * MiB) -#elif defined(__aarch64__) -# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB) -#elif defined(__s390x__) - /* We have a +- 4GB range on the branches; leave some slop. */ -# define MAX_CODE_GEN_BUFFER_SIZE (3 * GiB) -#elif defined(__mips__) - /* We have a 256MB branch region, but leave room to make sure the - main executable is also within that region. */ -# define MAX_CODE_GEN_BUFFER_SIZE (128 * MiB) -#else -# define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1) -#endif - -#if TCG_TARGET_REG_BITS == 32 -#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB) -#ifdef CONFIG_USER_ONLY -/* - * For user mode on smaller 32 bit systems we may run into trouble - * allocating big chunks of data in the right place. On these systems - * we utilise a static code generation buffer directly in the binary. - */ -#define USE_STATIC_CODE_GEN_BUFFER -#endif -#else /* TCG_TARGET_REG_BITS == 64 */ -#ifdef CONFIG_USER_ONLY -/* - * As user-mode emulation typically means running multiple instances - * of the translator don't go too nuts with our default code gen - * buffer lest we make things too hard for the OS. - */ -#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (128 * MiB) -#else -/* - * We expect most system emulation to run one or two guests per host. - * Users running large scale system emulation may want to tweak their - * runtime setup via the tb-size control on the command line. - */ -#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB) -#endif -#endif - -#define DEFAULT_CODE_GEN_BUFFER_SIZE \ - (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \ - ? DEFAULT_CODE_GEN_BUFFER_SIZE_1 : MAX_CODE_GEN_BUFFER_SIZE) - -static size_t size_code_gen_buffer(size_t tb_size) -{ - /* Size the buffer. */ - if (tb_size == 0) { - size_t phys_mem = qemu_get_host_physmem(); - if (phys_mem == 0) { - tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE; - } else { - tb_size = MIN(DEFAULT_CODE_GEN_BUFFER_SIZE, phys_mem / 8); - } - } - if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) { - tb_size = MIN_CODE_GEN_BUFFER_SIZE; - } - if (tb_size > MAX_CODE_GEN_BUFFER_SIZE) { - tb_size = MAX_CODE_GEN_BUFFER_SIZE; - } - return tb_size; -} - -#ifdef __mips__ -/* In order to use J and JAL within the code_gen_buffer, we require - that the buffer not cross a 256MB boundary. */ -static inline bool cross_256mb(void *addr, size_t size) -{ - return ((uintptr_t)addr ^ ((uintptr_t)addr + size)) & ~0x0ffffffful; -} - -/* We weren't able to allocate a buffer without crossing that boundary, - so make do with the larger portion of the buffer that doesn't cross. - Returns the new base of the buffer, and adjusts code_gen_buffer_size. */ -static inline void *split_cross_256mb(void *buf1, size_t size1) -{ - void *buf2 = (void *)(((uintptr_t)buf1 + size1) & ~0x0ffffffful); - size_t size2 = buf1 + size1 - buf2; - - size1 = buf2 - buf1; - if (size1 < size2) { - size1 = size2; - buf1 = buf2; - } - - tcg_ctx->code_gen_buffer_size = size1; - return buf1; -} -#endif - -#ifdef USE_STATIC_CODE_GEN_BUFFER -static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE] - __attribute__((aligned(CODE_GEN_ALIGN))); - -static bool alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp) -{ - void *buf, *end; - size_t size; - - if (splitwx > 0) { - error_setg(errp, "jit split-wx not supported"); - return false; - } - - /* page-align the beginning and end of the buffer */ - buf = static_code_gen_buffer; - end = static_code_gen_buffer + sizeof(static_code_gen_buffer); - buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size); - end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size); - - size = end - buf; - - /* Honor a command-line option limiting the size of the buffer. */ - if (size > tb_size) { - size = QEMU_ALIGN_DOWN(tb_size, qemu_real_host_page_size); - } - tcg_ctx->code_gen_buffer_size = size; - -#ifdef __mips__ - if (cross_256mb(buf, size)) { - buf = split_cross_256mb(buf, size); - size = tcg_ctx->code_gen_buffer_size; - } -#endif - - if (qemu_mprotect_rwx(buf, size)) { - error_setg_errno(errp, errno, "mprotect of jit buffer"); - return false; - } - qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE); - - tcg_ctx->code_gen_buffer = buf; - return true; -} -#elif defined(_WIN32) -static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp) -{ - void *buf; - - if (splitwx > 0) { - error_setg(errp, "jit split-wx not supported"); - return false; - } - - buf = VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT, - PAGE_EXECUTE_READWRITE); - if (buf == NULL) { - error_setg_win32(errp, GetLastError(), - "allocate %zu bytes for jit buffer", size); - return false; - } - - tcg_ctx->code_gen_buffer = buf; - tcg_ctx->code_gen_buffer_size = size; - return true; -} -#else -static bool alloc_code_gen_buffer_anon(size_t size, int prot, - int flags, Error **errp) -{ - void *buf; - - buf = mmap(NULL, size, prot, flags, -1, 0); - if (buf == MAP_FAILED) { - error_setg_errno(errp, errno, - "allocate %zu bytes for jit buffer", size); - return false; - } - tcg_ctx->code_gen_buffer_size = size; - -#ifdef __mips__ - if (cross_256mb(buf, size)) { - /* - * Try again, with the original still mapped, to avoid re-acquiring - * the same 256mb crossing. - */ - size_t size2; - void *buf2 = mmap(NULL, size, prot, flags, -1, 0); - switch ((int)(buf2 != MAP_FAILED)) { - case 1: - if (!cross_256mb(buf2, size)) { - /* Success! Use the new buffer. */ - munmap(buf, size); - break; - } - /* Failure. Work with what we had. */ - munmap(buf2, size); - /* fallthru */ - default: - /* Split the original buffer. Free the smaller half. */ - buf2 = split_cross_256mb(buf, size); - size2 = tcg_ctx->code_gen_buffer_size; - if (buf == buf2) { - munmap(buf + size2, size - size2); - } else { - munmap(buf, size - size2); - } - size = size2; - break; - } - buf = buf2; - } -#endif - - /* Request large pages for the buffer. */ - qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE); - - tcg_ctx->code_gen_buffer = buf; - return true; -} - -#ifndef CONFIG_TCG_INTERPRETER -#ifdef CONFIG_POSIX -#include "qemu/memfd.h" - -static bool alloc_code_gen_buffer_splitwx_memfd(size_t size, Error **errp) -{ - void *buf_rw = NULL, *buf_rx = MAP_FAILED; - int fd = -1; - -#ifdef __mips__ - /* Find space for the RX mapping, vs the 256MiB regions. */ - if (!alloc_code_gen_buffer_anon(size, PROT_NONE, - MAP_PRIVATE | MAP_ANONYMOUS | - MAP_NORESERVE, errp)) { - return false; - } - /* The size of the mapping may have been adjusted. */ - size = tcg_ctx->code_gen_buffer_size; - buf_rx = tcg_ctx->code_gen_buffer; -#endif - - buf_rw = qemu_memfd_alloc("tcg-jit", size, 0, &fd, errp); - if (buf_rw == NULL) { - goto fail; - } - -#ifdef __mips__ - void *tmp = mmap(buf_rx, size, PROT_READ | PROT_EXEC, - MAP_SHARED | MAP_FIXED, fd, 0); - if (tmp != buf_rx) { - goto fail_rx; - } -#else - buf_rx = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0); - if (buf_rx == MAP_FAILED) { - goto fail_rx; - } -#endif - - close(fd); - tcg_ctx->code_gen_buffer = buf_rw; - tcg_ctx->code_gen_buffer_size = size; - tcg_splitwx_diff = buf_rx - buf_rw; - - /* Request large pages for the buffer and the splitwx. */ - qemu_madvise(buf_rw, size, QEMU_MADV_HUGEPAGE); - qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE); - return true; - - fail_rx: - error_setg_errno(errp, errno, "failed to map shared memory for execute"); - fail: - if (buf_rx != MAP_FAILED) { - munmap(buf_rx, size); - } - if (buf_rw) { - munmap(buf_rw, size); - } - if (fd >= 0) { - close(fd); - } - return false; -} -#endif /* CONFIG_POSIX */ - -#ifdef CONFIG_DARWIN -#include - -extern kern_return_t mach_vm_remap(vm_map_t target_task, - mach_vm_address_t *target_address, - mach_vm_size_t size, - mach_vm_offset_t mask, - int flags, - vm_map_t src_task, - mach_vm_address_t src_address, - boolean_t copy, - vm_prot_t *cur_protection, - vm_prot_t *max_protection, - vm_inherit_t inheritance); - -static bool alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp) -{ - kern_return_t ret; - mach_vm_address_t buf_rw, buf_rx; - vm_prot_t cur_prot, max_prot; - - /* Map the read-write portion via normal anon memory. */ - if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, - MAP_PRIVATE | MAP_ANONYMOUS, errp)) { - return false; - } - - buf_rw = (mach_vm_address_t)tcg_ctx->code_gen_buffer; - buf_rx = 0; - ret = mach_vm_remap(mach_task_self(), - &buf_rx, - size, - 0, - VM_FLAGS_ANYWHERE, - mach_task_self(), - buf_rw, - false, - &cur_prot, - &max_prot, - VM_INHERIT_NONE); - if (ret != KERN_SUCCESS) { - /* TODO: Convert "ret" to a human readable error message. */ - error_setg(errp, "vm_remap for jit splitwx failed"); - munmap((void *)buf_rw, size); - return false; - } - - if (mprotect((void *)buf_rx, size, PROT_READ | PROT_EXEC) != 0) { - error_setg_errno(errp, errno, "mprotect for jit splitwx"); - munmap((void *)buf_rx, size); - munmap((void *)buf_rw, size); - return false; - } - - tcg_splitwx_diff = buf_rx - buf_rw; - return true; -} -#endif /* CONFIG_DARWIN */ -#endif /* CONFIG_TCG_INTERPRETER */ - -static bool alloc_code_gen_buffer_splitwx(size_t size, Error **errp) -{ -#ifndef CONFIG_TCG_INTERPRETER -# ifdef CONFIG_DARWIN - return alloc_code_gen_buffer_splitwx_vmremap(size, errp); -# endif -# ifdef CONFIG_POSIX - return alloc_code_gen_buffer_splitwx_memfd(size, errp); -# endif -#endif - error_setg(errp, "jit split-wx not supported"); - return false; -} - -static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp) -{ - ERRP_GUARD(); - int prot, flags; - - if (splitwx) { - if (alloc_code_gen_buffer_splitwx(size, errp)) { - return true; - } - /* - * If splitwx force-on (1), fail; - * if splitwx default-on (-1), fall through to splitwx off. - */ - if (splitwx > 0) { - return false; - } - error_free_or_abort(errp); - } - - prot = PROT_READ | PROT_WRITE | PROT_EXEC; - flags = MAP_PRIVATE | MAP_ANONYMOUS; -#ifdef CONFIG_TCG_INTERPRETER - /* The tcg interpreter does not need execute permission. */ - prot = PROT_READ | PROT_WRITE; -#elif defined(CONFIG_DARWIN) - /* Applicable to both iOS and macOS (Apple Silicon). */ - if (!splitwx) { - flags |= MAP_JIT; - } -#endif - - return alloc_code_gen_buffer_anon(size, prot, flags, errp); -} -#endif /* USE_STATIC_CODE_GEN_BUFFER, WIN32, POSIX */ - static bool tb_cmp(const void *ap, const void *bp) { const TranslationBlock *a = ap; @@ -1323,19 +919,11 @@ static void tb_htable_init(void) size. */ void tcg_exec_init(unsigned long tb_size, int splitwx) { - bool ok; - tcg_allowed = true; tcg_context_init(&tcg_init_ctx); page_init(); tb_htable_init(); - - ok = alloc_code_gen_buffer(size_code_gen_buffer(tb_size), - splitwx, &error_fatal); - assert(ok); - - /* TODO: allocating regions is hand-in-glove with code_gen_buffer. */ - tcg_region_init(); + tcg_region_init(tb_size, splitwx); #if defined(CONFIG_SOFTMMU) /* There's no guest base to take into account, so go ahead and diff --git a/tcg/region.c b/tcg/region.c index af45a0174e..8d88144a22 100644 --- a/tcg/region.c +++ b/tcg/region.c @@ -23,6 +23,8 @@ */ #include "qemu/osdep.h" +#include "qemu/units.h" +#include "qapi/error.h" #include "exec/exec-all.h" #include "tcg/tcg.h" #if !defined(CONFIG_USER_ONLY) @@ -406,6 +408,408 @@ static size_t tcg_n_regions(void) } #endif +/* Minimum size of the code gen buffer. This number is randomly chosen, + but not so small that we can't have a fair number of TB's live. */ +#define MIN_CODE_GEN_BUFFER_SIZE (1 * MiB) + +/* Maximum size of the code gen buffer we'd like to use. Unless otherwise + indicated, this is constrained by the range of direct branches on the + host cpu, as used by the TCG implementation of goto_tb. */ +#if defined(__x86_64__) +# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB) +#elif defined(__sparc__) +# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB) +#elif defined(__powerpc64__) +# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB) +#elif defined(__powerpc__) +# define MAX_CODE_GEN_BUFFER_SIZE (32 * MiB) +#elif defined(__aarch64__) +# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB) +#elif defined(__s390x__) + /* We have a +- 4GB range on the branches; leave some slop. */ +# define MAX_CODE_GEN_BUFFER_SIZE (3 * GiB) +#elif defined(__mips__) + /* We have a 256MB branch region, but leave room to make sure the + main executable is also within that region. */ +# define MAX_CODE_GEN_BUFFER_SIZE (128 * MiB) +#else +# define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1) +#endif + +#if TCG_TARGET_REG_BITS == 32 +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB) +#ifdef CONFIG_USER_ONLY +/* + * For user mode on smaller 32 bit systems we may run into trouble + * allocating big chunks of data in the right place. On these systems + * we utilise a static code generation buffer directly in the binary. + */ +#define USE_STATIC_CODE_GEN_BUFFER +#endif +#else /* TCG_TARGET_REG_BITS == 64 */ +#ifdef CONFIG_USER_ONLY +/* + * As user-mode emulation typically means running multiple instances + * of the translator don't go too nuts with our default code gen + * buffer lest we make things too hard for the OS. + */ +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (128 * MiB) +#else +/* + * We expect most system emulation to run one or two guests per host. + * Users running large scale system emulation may want to tweak their + * runtime setup via the tb-size control on the command line. + */ +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB) +#endif +#endif + +#define DEFAULT_CODE_GEN_BUFFER_SIZE \ + (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \ + ? DEFAULT_CODE_GEN_BUFFER_SIZE_1 : MAX_CODE_GEN_BUFFER_SIZE) + +static size_t size_code_gen_buffer(size_t tb_size) +{ + /* Size the buffer. */ + if (tb_size == 0) { + size_t phys_mem = qemu_get_host_physmem(); + if (phys_mem == 0) { + tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE; + } else { + tb_size = MIN(DEFAULT_CODE_GEN_BUFFER_SIZE, phys_mem / 8); + } + } + if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) { + tb_size = MIN_CODE_GEN_BUFFER_SIZE; + } + if (tb_size > MAX_CODE_GEN_BUFFER_SIZE) { + tb_size = MAX_CODE_GEN_BUFFER_SIZE; + } + return tb_size; +} + +#ifdef __mips__ +/* In order to use J and JAL within the code_gen_buffer, we require + that the buffer not cross a 256MB boundary. */ +static inline bool cross_256mb(void *addr, size_t size) +{ + return ((uintptr_t)addr ^ ((uintptr_t)addr + size)) & ~0x0ffffffful; +} + +/* We weren't able to allocate a buffer without crossing that boundary, + so make do with the larger portion of the buffer that doesn't cross. + Returns the new base of the buffer, and adjusts code_gen_buffer_size. */ +static inline void *split_cross_256mb(void *buf1, size_t size1) +{ + void *buf2 = (void *)(((uintptr_t)buf1 + size1) & ~0x0ffffffful); + size_t size2 = buf1 + size1 - buf2; + + size1 = buf2 - buf1; + if (size1 < size2) { + size1 = size2; + buf1 = buf2; + } + + tcg_ctx->code_gen_buffer_size = size1; + return buf1; +} +#endif + +#ifdef USE_STATIC_CODE_GEN_BUFFER +static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE] + __attribute__((aligned(CODE_GEN_ALIGN))); + +static bool alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp) +{ + void *buf, *end; + size_t size; + + if (splitwx > 0) { + error_setg(errp, "jit split-wx not supported"); + return false; + } + + /* page-align the beginning and end of the buffer */ + buf = static_code_gen_buffer; + end = static_code_gen_buffer + sizeof(static_code_gen_buffer); + buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size); + end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size); + + size = end - buf; + + /* Honor a command-line option limiting the size of the buffer. */ + if (size > tb_size) { + size = QEMU_ALIGN_DOWN(tb_size, qemu_real_host_page_size); + } + tcg_ctx->code_gen_buffer_size = size; + +#ifdef __mips__ + if (cross_256mb(buf, size)) { + buf = split_cross_256mb(buf, size); + size = tcg_ctx->code_gen_buffer_size; + } +#endif + + if (qemu_mprotect_rwx(buf, size)) { + error_setg_errno(errp, errno, "mprotect of jit buffer"); + return false; + } + qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE); + + tcg_ctx->code_gen_buffer = buf; + return true; +} +#elif defined(_WIN32) +static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp) +{ + void *buf; + + if (splitwx > 0) { + error_setg(errp, "jit split-wx not supported"); + return false; + } + + buf = VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT, + PAGE_EXECUTE_READWRITE); + if (buf == NULL) { + error_setg_win32(errp, GetLastError(), + "allocate %zu bytes for jit buffer", size); + return false; + } + + tcg_ctx->code_gen_buffer = buf; + tcg_ctx->code_gen_buffer_size = size; + return true; +} +#else +static bool alloc_code_gen_buffer_anon(size_t size, int prot, + int flags, Error **errp) +{ + void *buf; + + buf = mmap(NULL, size, prot, flags, -1, 0); + if (buf == MAP_FAILED) { + error_setg_errno(errp, errno, + "allocate %zu bytes for jit buffer", size); + return false; + } + tcg_ctx->code_gen_buffer_size = size; + +#ifdef __mips__ + if (cross_256mb(buf, size)) { + /* + * Try again, with the original still mapped, to avoid re-acquiring + * the same 256mb crossing. + */ + size_t size2; + void *buf2 = mmap(NULL, size, prot, flags, -1, 0); + switch ((int)(buf2 != MAP_FAILED)) { + case 1: + if (!cross_256mb(buf2, size)) { + /* Success! Use the new buffer. */ + munmap(buf, size); + break; + } + /* Failure. Work with what we had. */ + munmap(buf2, size); + /* fallthru */ + default: + /* Split the original buffer. Free the smaller half. */ + buf2 = split_cross_256mb(buf, size); + size2 = tcg_ctx->code_gen_buffer_size; + if (buf == buf2) { + munmap(buf + size2, size - size2); + } else { + munmap(buf, size - size2); + } + size = size2; + break; + } + buf = buf2; + } +#endif + + /* Request large pages for the buffer. */ + qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE); + + tcg_ctx->code_gen_buffer = buf; + return true; +} + +#ifndef CONFIG_TCG_INTERPRETER +#ifdef CONFIG_POSIX +#include "qemu/memfd.h" + +static bool alloc_code_gen_buffer_splitwx_memfd(size_t size, Error **errp) +{ + void *buf_rw = NULL, *buf_rx = MAP_FAILED; + int fd = -1; + +#ifdef __mips__ + /* Find space for the RX mapping, vs the 256MiB regions. */ + if (!alloc_code_gen_buffer_anon(size, PROT_NONE, + MAP_PRIVATE | MAP_ANONYMOUS | + MAP_NORESERVE, errp)) { + return false; + } + /* The size of the mapping may have been adjusted. */ + size = tcg_ctx->code_gen_buffer_size; + buf_rx = tcg_ctx->code_gen_buffer; +#endif + + buf_rw = qemu_memfd_alloc("tcg-jit", size, 0, &fd, errp); + if (buf_rw == NULL) { + goto fail; + } + +#ifdef __mips__ + void *tmp = mmap(buf_rx, size, PROT_READ | PROT_EXEC, + MAP_SHARED | MAP_FIXED, fd, 0); + if (tmp != buf_rx) { + goto fail_rx; + } +#else + buf_rx = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0); + if (buf_rx == MAP_FAILED) { + goto fail_rx; + } +#endif + + close(fd); + tcg_ctx->code_gen_buffer = buf_rw; + tcg_ctx->code_gen_buffer_size = size; + tcg_splitwx_diff = buf_rx - buf_rw; + + /* Request large pages for the buffer and the splitwx. */ + qemu_madvise(buf_rw, size, QEMU_MADV_HUGEPAGE); + qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE); + return true; + + fail_rx: + error_setg_errno(errp, errno, "failed to map shared memory for execute"); + fail: + if (buf_rx != MAP_FAILED) { + munmap(buf_rx, size); + } + if (buf_rw) { + munmap(buf_rw, size); + } + if (fd >= 0) { + close(fd); + } + return false; +} +#endif /* CONFIG_POSIX */ + +#ifdef CONFIG_DARWIN +#include + +extern kern_return_t mach_vm_remap(vm_map_t target_task, + mach_vm_address_t *target_address, + mach_vm_size_t size, + mach_vm_offset_t mask, + int flags, + vm_map_t src_task, + mach_vm_address_t src_address, + boolean_t copy, + vm_prot_t *cur_protection, + vm_prot_t *max_protection, + vm_inherit_t inheritance); + +static bool alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp) +{ + kern_return_t ret; + mach_vm_address_t buf_rw, buf_rx; + vm_prot_t cur_prot, max_prot; + + /* Map the read-write portion via normal anon memory. */ + if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, errp)) { + return false; + } + + buf_rw = (mach_vm_address_t)tcg_ctx->code_gen_buffer; + buf_rx = 0; + ret = mach_vm_remap(mach_task_self(), + &buf_rx, + size, + 0, + VM_FLAGS_ANYWHERE, + mach_task_self(), + buf_rw, + false, + &cur_prot, + &max_prot, + VM_INHERIT_NONE); + if (ret != KERN_SUCCESS) { + /* TODO: Convert "ret" to a human readable error message. */ + error_setg(errp, "vm_remap for jit splitwx failed"); + munmap((void *)buf_rw, size); + return false; + } + + if (mprotect((void *)buf_rx, size, PROT_READ | PROT_EXEC) != 0) { + error_setg_errno(errp, errno, "mprotect for jit splitwx"); + munmap((void *)buf_rx, size); + munmap((void *)buf_rw, size); + return false; + } + + tcg_splitwx_diff = buf_rx - buf_rw; + return true; +} +#endif /* CONFIG_DARWIN */ +#endif /* CONFIG_TCG_INTERPRETER */ + +static bool alloc_code_gen_buffer_splitwx(size_t size, Error **errp) +{ +#ifndef CONFIG_TCG_INTERPRETER +# ifdef CONFIG_DARWIN + return alloc_code_gen_buffer_splitwx_vmremap(size, errp); +# endif +# ifdef CONFIG_POSIX + return alloc_code_gen_buffer_splitwx_memfd(size, errp); +# endif +#endif + error_setg(errp, "jit split-wx not supported"); + return false; +} + +static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp) +{ + ERRP_GUARD(); + int prot, flags; + + if (splitwx) { + if (alloc_code_gen_buffer_splitwx(size, errp)) { + return true; + } + /* + * If splitwx force-on (1), fail; + * if splitwx default-on (-1), fall through to splitwx off. + */ + if (splitwx > 0) { + return false; + } + error_free_or_abort(errp); + } + + prot = PROT_READ | PROT_WRITE | PROT_EXEC; + flags = MAP_PRIVATE | MAP_ANONYMOUS; +#ifdef CONFIG_TCG_INTERPRETER + /* The tcg interpreter does not need execute permission. */ + prot = PROT_READ | PROT_WRITE; +#elif defined(CONFIG_DARWIN) + /* Applicable to both iOS and macOS (Apple Silicon). */ + if (!splitwx) { + flags |= MAP_JIT; + } +#endif + + return alloc_code_gen_buffer_anon(size, prot, flags, errp); +} +#endif /* USE_STATIC_CODE_GEN_BUFFER, WIN32, POSIX */ + /* * Initializes region partitioning. * @@ -434,17 +838,24 @@ static size_t tcg_n_regions(void) * in practice. Multi-threaded guests share most if not all of their translated * code, which makes parallel code generation less appealing than in softmmu. */ -void tcg_region_init(void) +void tcg_region_init(size_t tb_size, int splitwx) { - void *buf = tcg_init_ctx.code_gen_buffer; - void *aligned; - size_t size = tcg_init_ctx.code_gen_buffer_size; - size_t page_size = qemu_real_host_page_size; + void *buf, *aligned; + size_t size; + size_t page_size; size_t region_size; size_t n_regions; size_t i; uintptr_t splitwx_diff; + bool ok; + ok = alloc_code_gen_buffer(size_code_gen_buffer(tb_size), + splitwx, &error_fatal); + assert(ok); + + buf = tcg_init_ctx.code_gen_buffer; + size = tcg_init_ctx.code_gen_buffer_size; + page_size = qemu_real_host_page_size; n_regions = tcg_n_regions(); /* The first region will be 'aligned - buf' bytes larger than the others */