From patchwork Fri Aug 14 20:54:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 276404 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_INVALID, DKIM_SIGNED, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94098C433DF for ; Sat, 15 Aug 2020 17:07:04 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5B55323B3D for ; Sat, 15 Aug 2020 17:07:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="qY5qZTA0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B55323B3D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:37280 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k6ze3-0003XH-If for qemu-devel@archiver.kernel.org; Sat, 15 Aug 2020 13:07:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:42514) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6zdD-0002LH-KK; Sat, 15 Aug 2020 13:06:11 -0400 Received: from mail-qt1-x841.google.com ([2607:f8b0:4864:20::841]:39650) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k6zdC-0001Nf-2T; Sat, 15 Aug 2020 13:06:11 -0400 Received: by mail-qt1-x841.google.com with SMTP id w9so9324292qts.6; Sat, 15 Aug 2020 10:06:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=C8DYrGMe3y/eVzf4pXQ5QA0B3zmxOvZQB53ARGNc55o=; b=qY5qZTA0p9G/ilHt/4N2AYlygJQ9zrO8zKkstj/82g21j3Z8fCgmNrpSVZF640E8ad zoPVZNecnlF0xyTfn71wWKYVDvEECWnCL4MscR04D44Xx0Knofxq6Z1bUsuTBpW7zlcS N2x3zc3wVIVJimwr9oltUyob0t/BabHUPYSSesv8Lr9OxYVOHb/PR3alyjxDpLJk4F2H fM2qOExcRkYU81OQJbX9Zx4OAoF8huoT2dPlAd1xEunHYExqoWqC8kv23/t+DCDT2MsJ h6luOzMZGKCL4PdQCp7DWdNAKp52LU2fcuFC9UGhnB2tNc+WIGuL3TZj1p8lxAAhnYk8 Rl8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=C8DYrGMe3y/eVzf4pXQ5QA0B3zmxOvZQB53ARGNc55o=; b=i+1a358VZJu60xmJDP+o3+iH2TyWtAIQn6s7oR2Qgsd3KEY76dVUZC4e4pCSRCO1aY 4aH/ZPFz/yTTb6muPk7GKdYWV667F8PSvPtbjSvJvci2wl7P4Khtb8TS0Z+6bNDMVERF C16GbOLr3Yz3YkoHXoT1E2MLQEDcw3eqPgStXVMeKQFDonVYHmnuc0c35gjTBx1jMw7v z1JWQFJZXVNNVEzr1fKGl/+K1KE0RMrXVz9uSK+riNlkozMtSzc76GqKyw8HpK2iM1L+ Bd8H3XNj0+1UP5PbUbI7QweU7OX0U50DIF922hknW09aHd7nJvetmJe1rS45USHxDQSe 9c7A== X-Gm-Message-State: AOAM530G33cIr7jX/7OlaGogMGwzFmlx3h4fD3lWesQwmWFCWQ8UsJq6 z7cJtz+LM81FCbhZrSkXuP5bqS08rMItCQ== X-Google-Smtp-Source: ABdhPJyHKq14iTpKYc5wJdXofWA+1bw5OnGOmJEZleZvtQYkThyO13voRwfd9A4CwN83abl0C4k+4w== X-Received: by 2002:ac8:44b9:: with SMTP id a25mr3618711qto.356.1597438480966; Fri, 14 Aug 2020 13:54:40 -0700 (PDT) Received: from rekt.ibmuc.com ([2804:431:c7c6:303f:d1dc:35d8:e9f6:c8b]) by smtp.gmail.com with ESMTPSA id p33sm12301018qtp.49.2020.08.14.13.54.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Aug 2020 13:54:40 -0700 (PDT) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Subject: [PATCH 02/10] numa: introduce MachineClass::forbid_asymmetrical_numa Date: Fri, 14 Aug 2020 17:54:16 -0300 Message-Id: <20200814205424.543857-3-danielhb413@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200814205424.543857-1-danielhb413@gmail.com> References: <20200814205424.543857-1-danielhb413@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::841; envelope-from=danielhb413@gmail.com; helo=mail-qt1-x841.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, Eduardo Habkost , david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The pSeries machine does not support asymmetrical NUMA configurations. CC: Eduardo Habkost CC: Marcel Apfelbaum Signed-off-by: Daniel Henrique Barboza --- hw/core/numa.c | 7 +++++++ hw/ppc/spapr.c | 1 + include/hw/boards.h | 1 + 3 files changed, 9 insertions(+) diff --git a/hw/core/numa.c b/hw/core/numa.c index d1a94a14f8..1e81233c1d 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -547,6 +547,7 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp) */ static void validate_numa_distance(MachineState *ms) { + MachineClass *mc = MACHINE_GET_CLASS(ms); int src, dst; bool is_asymmetrical = false; int nb_numa_nodes = ms->numa_state->num_nodes; @@ -575,6 +576,12 @@ static void validate_numa_distance(MachineState *ms) } if (is_asymmetrical) { + if (mc->forbid_asymmetrical_numa) { + error_report("This machine type does not support " + "asymmetrical numa distances."); + exit(EXIT_FAILURE); + } + for (src = 0; src < nb_numa_nodes; src++) { for (dst = 0; dst < nb_numa_nodes; dst++) { if (src != dst && numa_info[src].distance[dst] == 0) { diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index dd2fa4826b..3b16edaf4c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4512,6 +4512,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) */ mc->numa_mem_align_shift = 28; mc->auto_enable_numa = true; + mc->forbid_asymmetrical_numa = true; smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON; diff --git a/include/hw/boards.h b/include/hw/boards.h index bc5b82ad20..dc6cdd1c53 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -215,6 +215,7 @@ struct MachineClass { bool nvdimm_supported; bool numa_mem_supported; bool auto_enable_numa; + bool forbid_asymmetrical_numa; const char *default_ram_id; HotplugHandler *(*get_hotplug_handler)(MachineState *machine, From patchwork Fri Aug 14 20:54:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 276412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_INVALID, DKIM_SIGNED, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDCBDC433DF for ; Sat, 15 Aug 2020 15:52:07 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B4EA720656 for ; Sat, 15 Aug 2020 15:52:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MQ5jnwll" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B4EA720656 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:36244 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k6yTW-0007w2-W8 for qemu-devel@archiver.kernel.org; Sat, 15 Aug 2020 11:52:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:49546) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6ySW-0006at-AO; Sat, 15 Aug 2020 11:51:04 -0400 Received: from mail-qk1-x744.google.com ([2607:f8b0:4864:20::744]:39615) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k6ySU-0008Ag-Oh; Sat, 15 Aug 2020 11:51:04 -0400 Received: by mail-qk1-x744.google.com with SMTP id n129so11138987qkd.6; Sat, 15 Aug 2020 08:51:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=l6kZ8mrb93TKELcxaCgtqBHtArIefirGpqL2DrfF1Fg=; b=MQ5jnwlliGOFTgbpoZII2JL5az72qnkkEUzSlcPjfa7O2NtVRzJjQJmfjIljNc7GPF VEXrFjjUftfaW9HInWXPolSJXyIaVmR39KYO/q8ochXjE21PLaMm4iC45JGvSqdGuF8A nji0dm3m0w2coSK8Y5EXxnu4M8ud5Q5XasTR9MNiyL+H0Pgh/4mewpEC79pt71Le2Yae FowZ/ObUr1UCVmbVJK9b5Wl9JKcGmULHJzX1UbWdYjGc8wP6ZhE3gtFaS+Q2Vn19ypCL CXt4VaI56lOyVRA3K1Y2g++8k5merXrsKimFsnF9itP+UNbpd+FQ7uuF+6d+2SCjVEO4 OQZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=l6kZ8mrb93TKELcxaCgtqBHtArIefirGpqL2DrfF1Fg=; b=D8B+/GYbZrrrFcYmg4Xe83wWDDM3364rvrBOOvBCivwYOoayDy9MjhXuxolSphrGyC szr9tNA8iXe0trtV1GqGOiUoDXm8ScQXVhjNh/9xN/QtrB0lSIcLeHTD4GJbSwFBW/sS Agc9hdbEl4eOsRK4Q3YegAlV/78ZriroUG1aWgKoRWV1Rn8Z60/hf0B9f+lT9sUfYRDf HIgBrHcr16OWkU0Ybgidq/DS4s+0qamA3jA46SvX1A1NpPPPKIeo0K4AiLj2bdHc7uhc SOuRZK0BHJsk6BS6EyTNApAssYqc638lWTDY6kLZH3gGPHHPnCFM+d3BsQIW5ESLv+z/ M0Yw== X-Gm-Message-State: AOAM531rX9v27vj3s5gYsEt69ESv7PgZhhhqgKDI1a7ybWyZPk8rZZWl p5Dd0hAHnmHhnrY3OnaAD90uz4+XK2iy8A== X-Google-Smtp-Source: ABdhPJzJyA0/LTVeazlm4yOwSb7VEMXnmvvcwDMZ3RWDs2CeXG7/GntqG2qtPVLB8pO44rhAeI9J3w== X-Received: by 2002:ac8:4903:: with SMTP id e3mr3651827qtq.71.1597438484445; Fri, 14 Aug 2020 13:54:44 -0700 (PDT) Received: from rekt.ibmuc.com ([2804:431:c7c6:303f:d1dc:35d8:e9f6:c8b]) by smtp.gmail.com with ESMTPSA id p33sm12301018qtp.49.2020.08.14.13.54.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Aug 2020 13:54:43 -0700 (PDT) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Subject: [PATCH 04/10] spapr: add spapr_machine_using_legacy_numa() helper Date: Fri, 14 Aug 2020 17:54:18 -0300 Message-Id: <20200814205424.543857-5-danielhb413@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200814205424.543857-1-danielhb413@gmail.com> References: <20200814205424.543857-1-danielhb413@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::744; envelope-from=danielhb413@gmail.com; helo=mail-qk1-x744.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The changes to come to NUMA support are all guest visible. In theory we could just create a new 5_1 class option flag to avoid the changes to cascade to 5.1 and under. The reality is that these changes are only relevant if the machine has more than one NUMA node. There is no need to change guest behavior that has been around for years needlesly. This new helper will be used by the next patches to determine whether we should retain the (soon to be) legacy NUMA behavior in the pSeries machine. The new behavior will only be exposed if:: - machine is pseries-5.2 and newer; - more than one NUMA node is declared in NUMA state. Signed-off-by: Daniel Henrique Barboza --- hw/ppc/spapr.c | 12 ++++++++++++ include/hw/ppc/spapr.h | 2 ++ 2 files changed, 14 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 22e78cfc84..073a59c47d 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -308,6 +308,15 @@ static hwaddr spapr_node0_size(MachineState *machine) return machine->ram_size; } +bool spapr_machine_using_legacy_numa(SpaprMachineState *spapr) +{ + MachineState *machine = MACHINE(spapr); + SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine); + + return smc->pre_5_2_numa_associativity || + machine->numa_state->num_nodes <= 1; +} + static void add_str(GString *s, const gchar *s1) { g_string_append_len(s, s1, strlen(s1) + 1); @@ -4602,8 +4611,11 @@ DEFINE_SPAPR_MACHINE(5_2, "5.2", true); */ static void spapr_machine_5_1_class_options(MachineClass *mc) { + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc); + spapr_machine_5_2_class_options(mc); compat_props_add(mc->compat_props, hw_compat_5_1, hw_compat_5_1_len); + smc->pre_5_2_numa_associativity = true; } DEFINE_SPAPR_MACHINE(5_1, "5.1", false); diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index 739a6a4942..d9f1afa8b2 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -130,6 +130,7 @@ struct SpaprMachineClass { bool smp_threads_vsmt; /* set VSMT to smp_threads by default */ hwaddr rma_limit; /* clamp the RMA to this size */ bool pre_5_1_assoc_refpoints; + bool pre_5_2_numa_associativity; void (*phb_placement)(SpaprMachineState *spapr, uint32_t index, uint64_t *buid, hwaddr *pio, @@ -847,6 +848,7 @@ int spapr_max_server_number(SpaprMachineState *spapr); void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex, uint64_t pte0, uint64_t pte1); void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered); +bool spapr_machine_using_legacy_numa(SpaprMachineState *spapr); /* DRC callbacks. */ void spapr_core_release(DeviceState *dev); From patchwork Fri Aug 14 20:54:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 276405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_INVALID, DKIM_SIGNED, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 349A4C433E1 for ; Sat, 15 Aug 2020 16:58:29 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ED7A123B18 for ; Sat, 15 Aug 2020 16:58:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="W32Mdrbs" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ED7A123B18 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:58356 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k6zVk-00083Q-92 for qemu-devel@archiver.kernel.org; Sat, 15 Aug 2020 12:58:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40640) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6zV6-0007Um-73; Sat, 15 Aug 2020 12:57:48 -0400 Received: from mail-vs1-xe44.google.com ([2607:f8b0:4864:20::e44]:41246) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k6zV4-0000Qv-KZ; Sat, 15 Aug 2020 12:57:47 -0400 Received: by mail-vs1-xe44.google.com with SMTP id y8so6207844vsq.8; Sat, 15 Aug 2020 09:57:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QhUbY4D4iyRK40TzD9Y23365Paz8ar+9JQzeCK3jAbI=; b=W32MdrbsOEQev3So74vDtyL6MuBhOlGqrcpDqSgXcdFOacc2hM3wUh9iQM734pwC7l iQRYObiVZz04q2Sb2vew4IGr7knyLwwtRwIK1jUZ28JuptPM6ifHX97sonv+ZKeDP5MA e5Lm2PVjOncm505Er76Oi5w8ViKfamQHU4UNLfiac+mL+fA9E9pCk27mO1H1WLPfYOXG AVnbdUuSkbhM7alcsyv0zehtLqzHYhKhjB5xv6OcpSwWHXv1FENDWI5uHsn2zPpHam2c M5N+VcqGIS5NkSQNlEJAswpN9Ag9I/EJKLJfNQbdV0Sc1nePFYNhEOxjfMoojpUbGGa3 U9mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QhUbY4D4iyRK40TzD9Y23365Paz8ar+9JQzeCK3jAbI=; b=GtCUaS+qOb5LpJvxZhIDM+zdPslN4TzOK1p3qLGN4VaZvOnxyJ+sm5RFiglTAPyhs7 BFY/InGnsBTGJy0KYPMnbYwEH8frUSOdCAonMXtokHY5u0D1/mcH6MzMRGkXqyeYqdFk RBZeyo0OppqRLSpl/sJoGJoLv7niMkqcueL+ORd9LbcK0x8a0bNRLeIWyJnvVmvEoXTA Hu//K46uKUq4ZwTmXdJm9UylXGnstAVLIQ1fVUC+x08cTPvLGunwrBd9G3e4mIolfnlr cdfYbF4sJK6LiJzrwY69hCfqP/rLZkFu8wMTERp5qjYLCAtWzg6Y1vpWgmxcF5UtduS6 ErpQ== X-Gm-Message-State: AOAM532tt3Jpj1LXkoyzjH5VvgCpfu5AVaC9QGlI5InjYGHc1zZHVIKQ ujUCbolEkma5lsgY7uPichmveywEaIGsoA== X-Google-Smtp-Source: ABdhPJwQKPZyG6C+NjbJvTmmEOV3QDDLSpN3z5ZBpp4cCh3YkIrGX8qKLgg5wzPURhYCXqXm1jSEkw== X-Received: by 2002:a05:6214:12b4:: with SMTP id w20mr4421242qvu.32.1597438486131; Fri, 14 Aug 2020 13:54:46 -0700 (PDT) Received: from rekt.ibmuc.com ([2804:431:c7c6:303f:d1dc:35d8:e9f6:c8b]) by smtp.gmail.com with ESMTPSA id p33sm12301018qtp.49.2020.08.14.13.54.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Aug 2020 13:54:45 -0700 (PDT) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Subject: [PATCH 05/10] spapr: make ibm, max-associativity-domains scale with user input Date: Fri, 14 Aug 2020 17:54:19 -0300 Message-Id: <20200814205424.543857-6-danielhb413@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200814205424.543857-1-danielhb413@gmail.com> References: <20200814205424.543857-1-danielhb413@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::e44; envelope-from=danielhb413@gmail.com; helo=mail-vs1-xe44.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The ibm,max-associativity-domains is considering that only a single associativity domain can exist in the same NUMA level. This is true today because we do not support any type of NUMA distance user customization, and all nodes are in the same distance to each other. To enhance NUMA distance support in the pSeries machine we need to make this limit flexible. This patch rewrites the max-associativity logic to consider that multiple associativity domains can co-exist in the same NUMA level. We're using the legacy_numa() helper to avoid leaking unneeded guest changes. Signed-off-by: Daniel Henrique Barboza --- hw/ppc/spapr.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 073a59c47d..b0c4b80a23 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -919,13 +919,20 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt) cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE & 0xffffffff), cpu_to_be32(ms->smp.max_cpus / ms->smp.threads), }; - uint32_t maxdomain = cpu_to_be32(spapr->extra_numa_nodes > 1 ? 1 : 0); + + /* The maximum domains for a given NUMA level, supposing that every + * additional NUMA node belongs to the same domain (aside from the + * 4th level, where we must support all available NUMA domains), is + * total number of domains - 1. */ + uint32_t total_nodes_number = ms->numa_state->num_nodes + + spapr->extra_numa_nodes; + uint32_t maxdomain = cpu_to_be32(total_nodes_number - 1); uint32_t maxdomains[] = { cpu_to_be32(4), maxdomain, maxdomain, maxdomain, - cpu_to_be32(ms->numa_state->num_nodes + spapr->extra_numa_nodes), + cpu_to_be32(total_nodes_number), }; _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); @@ -962,6 +969,13 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt) qemu_hypertas->str, qemu_hypertas->len)); g_string_free(qemu_hypertas, TRUE); + if (spapr_machine_using_legacy_numa(spapr)) { + maxdomain = cpu_to_be32(spapr->extra_numa_nodes > 1 ? 1 : 0); + maxdomains[1] = maxdomain; + maxdomains[2] = maxdomain; + maxdomains[3] = maxdomain; + } + if (smc->pre_5_1_assoc_refpoints) { nr_refpoints = 2; } From patchwork Fri Aug 14 20:54:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 276418 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_INVALID, DKIM_SIGNED, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9267C433DF for ; Sat, 15 Aug 2020 15:22:37 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6D1842310F for ; Sat, 15 Aug 2020 15:22:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nPhx5OcI" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6D1842310F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:46886 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k6y0y-0001Ql-Jx for qemu-devel@archiver.kernel.org; Sat, 15 Aug 2020 11:22:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60276) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6xyF-0005pw-9g; Sat, 15 Aug 2020 11:19:47 -0400 Received: from mail-qk1-x731.google.com ([2607:f8b0:4864:20::731]:37938) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k6xyD-00026P-Cn; Sat, 15 Aug 2020 11:19:47 -0400 Received: by mail-qk1-x731.google.com with SMTP id 77so11087988qkm.5; Sat, 15 Aug 2020 08:19:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=d2aeaGkVtyhLHxRTnglxsmaRe1/zdnOa+obmvojxAqs=; b=nPhx5OcIeEzG9ytkTifV9w4J/H/jmI+n5tlNQaDNUJ0/uSlOUJ3Dl/pND8+SZf97kS tfydWfvhKK1D+OH9LQPZGdMiAuS0dQlPl4uAf4CxgGSjvoeW0vD6AKStn2/5IqmbBjPo KdFXVHYHvv/pXH1mcJuaIRMAAmUgC2Cx9WaYgQMjuXRdRFqt66qo0453ap050gmFW9U8 DmKzeoF/pP3JykYJEsNB2wfcLskwRXF24j1fPTUJ3Kr6vPeokCuss7z+/dGQL8ZVAh9G uycdJABfVx7bW0mxCxDC84AmMsqCoTUog0EQRHIqVi9N+HYQy5/swXatNlqlV+7WdMRZ 4fnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d2aeaGkVtyhLHxRTnglxsmaRe1/zdnOa+obmvojxAqs=; b=Ix1AdKgT5yhK/9RWgMMKnDXifApXLQuNcMLNk+Y7PgqoQpRrqTUKeuIUKXapSpGaT/ OO961ha9QcO+lGwimYfn1XN8mG3A4TbSk9BE1RhnAEzR36n9fjYMH85sKQkhi9S2uJfP WnRMO+n7VeeYssADaHzpQc2lKd7yynSK1Jvuvq3CvrHFsoROfwGBJsB989HXYr3qgi5p ehDUT6N79BqUKCBzYNrT4Cm97I2Fv0e+VX5SDo7UPw3UW4Py7w49EIB6PjbLsXJZWMcc ONTUVqoO/xAEe4Sbr29jyA1/KGLi39s8OP4i3B1BT1vgOYlhT9M4CwVEAUZiDm7QRSv7 cSNA== X-Gm-Message-State: AOAM533gqEchz8OKxhUZ+3Ha+NOuGKm2wng0blsyGvwwcuNZa5wfVfCJ JcNOoMMZEnVa2J3rDq0f7tHkdTVnY/lcsg== X-Google-Smtp-Source: ABdhPJwvT2SohpZxKh+oLUHKrzbPhDyFxksu3Y0/5paNkRPeJcKiMud2eLTIWqZwkwNG4lXsJCmJWQ== X-Received: by 2002:ac8:490d:: with SMTP id e13mr3666701qtq.198.1597438492855; Fri, 14 Aug 2020 13:54:52 -0700 (PDT) Received: from rekt.ibmuc.com ([2804:431:c7c6:303f:d1dc:35d8:e9f6:c8b]) by smtp.gmail.com with ESMTPSA id p33sm12301018qtp.49.2020.08.14.13.54.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Aug 2020 13:54:52 -0700 (PDT) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Subject: [PATCH 09/10] spapr: consider user input when defining spapr guest NUMA Date: Fri, 14 Aug 2020 17:54:23 -0300 Message-Id: <20200814205424.543857-10-danielhb413@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200814205424.543857-1-danielhb413@gmail.com> References: <20200814205424.543857-1-danielhb413@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::731; envelope-from=danielhb413@gmail.com; helo=mail-qk1-x731.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This patch puts all the pieces together to finally allow user input when defining the NUMA topology of the spapr guest. The logic is centered in the new spapr_init_numa_assoc_domains() function. This is called once at machine_init(), if we're not using legacy_numa mode, to initiate the numa_assoc_domain matrix introduced in the previous patch. We can divide the logic in two that are mashed together in the body of this function. First stage is to sanitize the user input from numa_state. Due to the nature of what ACPI allows the user to do (directly define the distances the guest will see in the DT) versus what PAPR allows (we can hint at associativity relations, the OS must decide what to do), we had to bake in kernel logic in here. The kernel allows 4 levels of NUMA, where the last one is always the node_id itself, with distance = 10. The other levels doubles the distances of previous levels, meaning that the pSeries kernel will only show distances of 20, 40, 80 and 160 (in case no match is found). This first stage is then to get the distances defined by the user and approximate them to those discrete values: - user distance 11 to 30 will be interpreted as 20 - user distance 31 to 60 will be interpreted as 40 - user distance 61 to 120 will be interpreted as 80 - user distance 121 and beyond will be interpreted as 160 - user distance 10 stays 10 The other stage is defining the associativity domains based on the NUMA level match. Again, more than one strategy exists for this same problem, with different results. The approach taken is to re-use any existing associativity values to the new matches, instead of overwriting them with a new associativity match. This decision is necessary because neither we, nor the pSeries kernel, supports multiple associativity domains for each resource, meaning that we have to decide what to preserve. With the current logic, the associativities established by the earlier nodes take precedence, i.e. associativities defined by the first node are retained along all other nodes. These decisions have direct impact on how the user will interact with the NUMA topology, and none of them are perfect. To keep this commit message no longer than it already is, let's update the existing documentation in ppc-spapr-numa.rst with more in depth details and design considerations/drawbacks in the next patch. Signed-off-by: Daniel Henrique Barboza --- hw/ppc/spapr.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 109 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 4f50ab21ee..0d60d06cf4 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -222,6 +222,109 @@ void spapr_set_associativity(uint32_t *assoc, int node_id, int cpu_index, assoc[4] = cpu_to_be32(node_id); } +static void spapr_numa_assoc_assign_domain(SpaprMachineClass *smc, + uint8_t nodeA, uint8_t nodeB, + uint8_t numaLevel, + uint8_t curr_domain) +{ + uint8_t assoc_A, assoc_B; + + assoc_A = smc->numa_assoc_domains[nodeA][numaLevel]; + assoc_B = smc->numa_assoc_domains[nodeB][numaLevel]; + + /* No associativity domains on both. Assign and move on */ + if ((assoc_A | assoc_B) == 0) { + smc->numa_assoc_domains[nodeA][numaLevel] = curr_domain; + smc->numa_assoc_domains[nodeB][numaLevel] = curr_domain; + return; + } + + /* Use the existing assoc domain of any of the nodes to not + * disrupt previous associations already defined */ + if (assoc_A != 0) { + smc->numa_assoc_domains[nodeB][numaLevel] = assoc_A; + } else { + smc->numa_assoc_domains[nodeA][numaLevel] = assoc_B; + } +} + +static void spapr_init_numa_assoc_domains(MachineState *machine) +{ + SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine); + int nb_numa_nodes = machine->numa_state->num_nodes; + NodeInfo *numa_info = machine->numa_state->nodes; + uint8_t existing_nodes[nb_numa_nodes]; + int i, j, src_node, dst_node, index = 0; + + /* We don't have information about any extra NUMA nodes that + * the machine might create at this point (e.g. NVLink2 GPUs). + * Assigning associativity domains with low numbers might have + * unintended consequences in the presence of GPUs, which are + * supposed to always be at maximum distance of everything else, + * because we might end up using a GPU numa_id identifier by + * accident. + * + * Starting this counter at MAX_NODES avoids any possible + * collision since no NUMA id can reach this value. */ + uint8_t assoc_domain = MAX_NODES; + + /* We will assume that the NUMA nodes might be sparsed. This + * preliminary fetch step is required to avoid having to search + * for an existing NUMA node more than once. */ + for (i = 0; i < MAX_NODES; i++) { + if (numa_info[i].present) { + existing_nodes[index++] = i; + if (index == nb_numa_nodes) { + break; + } + } + } + + /* Start iterating through the existing numa nodes to + * define associativity groups */ + for (i = 0; i < nb_numa_nodes; i++) { + uint8_t distance = 20; + uint8_t lower_end = 10; + uint8_t upper_end = 0; + + src_node = existing_nodes[i]; + + /* Calculate all associativity domains src_node belongs to. */ + for(index = 0; index < 3; index++) { + upper_end = distance/2 + distance; + + for(j = i + 1; j < nb_numa_nodes; j++) { + uint8_t node_dist; + + dst_node = existing_nodes[j]; + node_dist = numa_info[src_node].distance[dst_node]; + + if (node_dist > lower_end && node_dist <= upper_end) { + spapr_numa_assoc_assign_domain(smc, src_node, dst_node, + 2 - index, assoc_domain); + assoc_domain++; + } + } + + lower_end = upper_end; + distance *= 2; + } + } + + /* Zero (0) is considered a valid associativity domain identifier. + * To avoid NUMA nodes having matches where it wasn't intended, fill + * the zeros with unique identifiers. */ + for (i = 0; i < nb_numa_nodes; i++) { + src_node = existing_nodes[i]; + for (j = 0; j < 3; j++) { + if (smc->numa_assoc_domains[src_node][j] == 0) { + smc->numa_assoc_domains[src_node][j] = assoc_domain; + assoc_domain++; + } + } + } + } + static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, PowerPCCPU *cpu, MachineState *machine) { @@ -2887,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) spapr->current_numa_id = 0; spapr->extra_numa_nodes = 0; + /* We don't need to init the NUMA matrix if we're running in + * legacy NUMA mode. */ + if (!spapr_machine_using_legacy_numa(spapr)) { + spapr_init_numa_assoc_domains(machine); + } + if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) && ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0, spapr->max_compat_pvr)) { From patchwork Fri Aug 14 20:54:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 276411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_INVALID, DKIM_SIGNED, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B397C433DF for ; Sat, 15 Aug 2020 16:10:34 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BFF1623B18 for ; Sat, 15 Aug 2020 16:10:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fX6amXTx" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BFF1623B18 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:55218 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k6ylN-0002qD-1I for qemu-devel@archiver.kernel.org; Sat, 15 Aug 2020 12:10:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56636) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6yk4-0001Wv-Dl; Sat, 15 Aug 2020 12:09:12 -0400 Received: from mail-qk1-x72a.google.com ([2607:f8b0:4864:20::72a]:35294) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k6yk2-0002UQ-CG; Sat, 15 Aug 2020 12:09:12 -0400 Received: by mail-qk1-x72a.google.com with SMTP id p25so11202908qkp.2; Sat, 15 Aug 2020 09:09:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=foegIg+R6xGelzocg1qpmzjKagMlOEuwe5qYOlr3pzs=; b=fX6amXTxJw3yKN5HikX+764lKobkINSWRwiCU3kdeD7rI1EQNbhUOBbOWS3jMONLPR QZzUdlcdMQ+VMOTrxF7hl3m3k94Pj4WEGUp8Zyz4sduy511U/c6ZK92Z3dLNBN+Ha0Bs uVwWqRqpOccRKsR+EBKFw3D4vp9cnMf6f2qB/XoFYUpG646nqWd3EN5kUPuwPk0XwMtk D+EESfyx4Jr5qj5YV0xV2lenR0xFogbb8I6vfo3XwYnWHkWYG8EnGwPOhYPlTbnzcFIw 9epvkT5ciH+owLvLYa/ImJvZZ1JrxGRI03DV9CeyZGdeU+7K7h0FgeOPO8elzR9Vytaf es8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=foegIg+R6xGelzocg1qpmzjKagMlOEuwe5qYOlr3pzs=; b=MosR95BZSgMr8nkuDwGCYRMA3GDvwDoPh9EcU45xlXm3ho4zRSaSRfhJcWY5qzYojv lFJYLtRvXG4Dv+HsoK9NywhlexSH2sOE5PCFJJXV+dRhTM80KcQ6OECLBNY439b2PSDi clLh6wlXJhrG8/4uP3o3WtrNK24uUMwxdqvLITc5/yWSXAMkH39+Jh6kNUlvfFfbJiO5 o5y/DkTZt7lpF81GaHuMdS/niKYbe5BLy7aaheL6U2XUlyqZoyanzkwdeR4MJ2Rm6Yaq 2kzP+wWRtZBOb2WcAfpDOB8TqQwJ8jkxyZfi7F6lujR6dcumV9erceogTuPzlKDyigme s/eg== X-Gm-Message-State: AOAM531vOiUi53HeBvhNFKoFvARMXrrLmKSDwui9YYgZl6WVd2+y6qJQ LmAeAGTJLoQ9Tc/METkOyLdkBkaIPMYJqA== X-Google-Smtp-Source: ABdhPJydDn0sx120VMqKlvwEZlpXbFioI1mkTrLBAnIUyix//GwiPDJ9Sb4CKLjnbZlRA+bRkBynVw== X-Received: by 2002:ac8:dce:: with SMTP id t14mr3637537qti.314.1597438494467; Fri, 14 Aug 2020 13:54:54 -0700 (PDT) Received: from rekt.ibmuc.com ([2804:431:c7c6:303f:d1dc:35d8:e9f6:c8b]) by smtp.gmail.com with ESMTPSA id p33sm12301018qtp.49.2020.08.14.13.54.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Aug 2020 13:54:54 -0700 (PDT) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Subject: [PATCH 10/10] specs/ppc-spapr-numa: update with new NUMA support Date: Fri, 14 Aug 2020 17:54:24 -0300 Message-Id: <20200814205424.543857-11-danielhb413@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200814205424.543857-1-danielhb413@gmail.com> References: <20200814205424.543857-1-danielhb413@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::72a; envelope-from=danielhb413@gmail.com; helo=mail-qk1-x72a.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This update provides more in depth information about the choices and drawbacks of the new NUMA support for the spapr machine. Signed-off-by: Daniel Henrique Barboza --- docs/specs/ppc-spapr-numa.rst | 213 ++++++++++++++++++++++++++++++++++ 1 file changed, 213 insertions(+) diff --git a/docs/specs/ppc-spapr-numa.rst b/docs/specs/ppc-spapr-numa.rst index e762038022..994bfb996f 100644 --- a/docs/specs/ppc-spapr-numa.rst +++ b/docs/specs/ppc-spapr-numa.rst @@ -189,3 +189,216 @@ QEMU up to 5.1, as follows: This also means that user input in QEMU command line does not change the NUMA distancing inside the guest for the pseries machine. + +New NUMA mechanics for pseries in QEMU 5.2 +========================================== + +Starting in QEMU 5.2, the pseries machine now considers user input when +setting NUMA topology of the guest. The following changes were made: + +* ibm,associativity-reference-points was changed to {0x4, 0x3, 0x2, 0x1}, allowing + for 4 distinct NUMA distance values based on the NUMA levels + +* ibm,max-associativity-domains was changed to support multiple associativity + domains in all NUMA levels. This is needed to ensure user flexibility + +* ibm,associativity for all resources now varies with user input + +These changes are only effective for pseries-5.2 and newer machines that are +created with more than one NUMA node (disconsidering NUMA nodes created by +the machine itself, e.g. NVLink 2 GPUs). The now legacy support has been +around for such a long time, with users seeing NUMA distances 10 and 40 +(and 80 if using NVLink2 GPUs), and there is no need to disrupt the +existing experience of those guests. + +To bring the user experience x86 users have when tuning up NUMA, we had +to operate under the current pseries Linux kernel logic described in +`How the pseries Linux guest calculates NUMA distances`_. The result +is that we needed to translate NUMA distance user input to pseries +Linux kernel input. + +Translating user distance to kernel distance +-------------------------------------------- + +User input for NUMA distance can vary from 10 to 254. We need to translate +that to the values that the Linux kernel operates on (10, 20, 40, 80, 160). +This is how it is being done: + +* user distance 11 to 30 will be interpreted as 20 +* user distance 31 to 60 will be interpreted as 40 +* user distance 61 to 120 will be interpreted as 80 +* user distance 121 and beyond will be interpreted as 160 +* user distance 10 stays 10 + +The reasoning behind this aproximation is to avoid any round up to the local +distance (10), keeping it exclusive to the 4th NUMA level (which is still +exclusive to the node_id). All other ranges were chosen under the developer +discretion of what would be (somewhat) sensible considering the user input. +Any other strategy can be used here, but in the end the reality is that we'll +have to accept that a large array of values will be translated to the same +NUMA topology in the guest, e.g. this user input: + +:: + + 0 1 2 + 0 10 31 120 + 1 31 10 30 + 2 120 30 10 + +And this other user input: + +:: + + 0 1 2 + 0 10 60 61 + 1 60 10 11 + 2 61 11 10 + +Will both be translated to the same values internally: + +:: + + 0 1 2 + 0 10 40 80 + 1 40 10 20 + 2 80 20 10 + +Users are encouraged to use only the kernel values in the NUMA definition to +avoid being taken by surprise with that the guest is actually seeing in the +topology. There are enough potential surprises that are inherent to the +associativity domain assignment process, discussed below. + + +How associativity domains are assigned +-------------------------------------- + +LOPAPR allows more than one associativity array (or 'string') per allocated +resource. This would be used to represent that the resource has multiple +connections with the board, and then the operational system, when deciding +NUMA distancing, should consider the associativity information that provides +the shortest distance. + +The spapr implementation does not support multiple associativity arrays per +resource, neither does the pseries Linux kernel. We'll have to represent the +NUMA topology using one associativity per resource, which means that choices +and compromises are going to be made. + +Consider the following NUMA topology entered by user input: + +:: + + 0 1 2 3 + 0 10 20 20 40 + 1 20 10 80 40 + 2 20 80 10 20 + 3 40 40 20 10 + +Honoring just the relative distances of node 0 to every other node, one possible +value for all associativity arrays would be: + +* node 0: 0 B A 0 +* node 1: 0 0 A 1 +* node 2: 0 0 A 2 +* node 3: 0 B 0 3 + +With the reference points {0x4, 0x3, 0x2, 0x1}, for node 0: + +* distance from 0 to 1 is 20 (no match at 0x4, will match at 0x3) +* distance from 0 to 2 is 20 (no match at 0x4, will match at 0x3) +* distance from 0 to 3 is 40 (no match at 0x4 and 0x3, will match + at 0x2) + +The distances related to node 0 are well represented. Doing for node 1, and keeping +in mind that we don't need to revisit node 0 again, the distance from node 1 to +2 is 80, matching at 0x4: + +* node 1: C 0 A 1 +* node 2: C 0 A 2 + +Over here we already have the first conflict. Even if we assign a new associativity +domain at 0x4 for 1 and 2, and we do that in the code, the kernel will define +the distance between 1 and 2 as 20, not 80, because both 1 and 2 have the "A" +associativity domain from the previous step. If we decide to discard the +associativity with "A" then the node 0 distances are compromised. + +Following up with the distance from 1 to 3 being 40 (a match in 0x2) we have another +decision to make. These are the current associativity domains of each: + +* node 1: C 0 A 1 +* node 3: 0 B 0 3 + +There is already an associativity domain at 0x2 in node 3, "B", which was assigned +by the node 0 distances. If we define a new associativity domain at this level +for 1 and 3 we will overwrite the existing associativity between 0 and 3. What +the code is doing in this case is to assign the existing domain to the +current associativity, in this case, "B" is now assigned to the 0x2 of node 1, +resulting in the following associativity arrays: + +* node 0: 0 B A 0 +* node 1: C 0 A 1 +* node 2: C B A 2 +* node 3: 0 B 0 3 + +In the last step we will analyze just nodes 2 and 3. The desired distance between +2 and 3 is 20, i.e. a match in 0x3. Node 2 already has a domain assigned in 0x3, +A, so we do the same as we did in the previous case and assign it to node 3 +at 0x3. This is the end result for the associativity arrays: + +* node 0: 0 B A 0 +* node 1: C 0 A 1 +* node 2: C B A 2 +* node 3: 0 B A 3 + +The kernel will read these arrays and will calculate the following NUMA topology for +the guest: + +:: + + 0 1 2 3 + 0 10 20 20 20 + 1 20 10 20 20 + 2 20 20 10 20 + 3 20 20 20 10 + +Which is not what the user wanted, but it is what the current logic and implementation +constraints of the kernel and QEMU will provide inside the LOPAPR specification. + +Changing a single value, specially a low distance value, makes for drastic changes +in the result. For example, with the same user input from above, but changing the +node distance from 0 to 1 to 40: + +:: + + 0 1 2 3 + 0 10 40 20 40 + 1 40 10 80 40 + 2 20 80 10 20 + 3 40 40 20 10 + +This is the result inside the guest, applying the same heuristics: + +:: + + $ numactl -H + available: 4 nodes (0-3) + (...) + node distances: + node 0 1 2 3 + 0: 10 40 20 20 + 1: 40 10 80 40 + 2: 20 80 10 20 + 3: 20 40 20 10 + +This result is much closer to the user input and only a single distance was changed +from the original. + +The kernel will always match with the shortest associativity domain possible, and we're +attempting to retain the previous established relations between the nodes. This means +that a distance equal to 20 between nodes A and B and the same distance 20 between nodes +A and F will cause the distance between B and F to also be 20. The same will happen to +other distances, but shorter distances has precedent over it to the distance calculation. + +Users are welcome to use this knowledge and experiment with the input to get the +NUMA topology they want, or as closer as they want. The important thing is to keep +expectations up to par with what we are capable of provide at this moment: an +approximation.