From patchwork Fri Aug 20 01:30:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 500379 Delivered-To: patch@linaro.org Received: by 2002:a02:6f15:0:0:0:0:0 with SMTP id x21csp1074299jab; Thu, 19 Aug 2021 18:31:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz+85RyUV2cW224NbHKZQcgGK0+2592CZH7v+4cZDTT0Kdn++275xUs7+mU2aCAw+TqP2Sp X-Received: by 2002:a02:946d:: with SMTP id a100mr15137304jai.118.1629423069966; Thu, 19 Aug 2021 18:31:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629423069; cv=none; d=google.com; s=arc-20160816; b=k/uSbq0v6ouHton0uI6SuangycOu/dgkrBN9Ip+mIBf4jvjGpEX6r+q6g+hWPDWbb3 nTyaYh2QYM6aGnCjSRpUQ17bP/LiBSj9i6wOjPbLqgjZ3hV4ekSKv6R7Ce4lJAPMTwHS xWGuYrEgSEfYvRsrSga4s6/rAhYkJk/TwK7iDkusCE56VmODKuDs9jVxUCcIa5FT3TGP 4AjYG8XxVKhWAl4Mlrc2o0UPPBCucovZV677K2KMdn3BULsj8yKtJgmn+YwSzalyoc1g 5XEhMjxv12uid54zLwuOt1Zf/FKTV9Xov1WATk0QhF+yPx6W7KCeW8gGcAqLn8XhkAdw dxLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=7O2k6+cYDuSSHjxTEm0czGO0EPIJVmQZ/1JyW8Uu9YY=; b=KqOjQMvkSZsoXCGGNMHtSoSgkrxhK1b5Vj6qt91Ar5Ntv6US83sIwvBPK7CPMlocT8 IR/1z6kA3GWI4rgmAW19m44VknGMRxcRR0wqNB6RQ2jiYrAZw7tz8MUePS6g042eS2Jj mXsGUA8AvQKNqM8xed4xyFULveXYbZu7op0xuvNI5dmNMcfFsy0WybGBspFkfX85+TTF 5t+zLTAnyo8meqymt7nCivDpdpvtqKZAl2WMLAUigDunkEEDR8DH2T0h8ihvQb2T14oh 4eCGYc8OTcTnqGJX3KawouATDZajSnfa6xe+l/dX3eCGBZhwoGyq0cS+FwIzGKwTkUG0 +2Rw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=mPAL2jB8; spf=pass (google.com: domain of linux-acpi-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-acpi-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 13si5377505ilt.16.2021.08.19.18.31.09; Thu, 19 Aug 2021 18:31:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-acpi-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=mPAL2jB8; spf=pass (google.com: domain of linux-acpi-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-acpi-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232191AbhHTBbp (ORCPT + 3 others); Thu, 19 Aug 2021 21:31:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236013AbhHTBbp (ORCPT ); Thu, 19 Aug 2021 21:31:45 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EF50C061756; Thu, 19 Aug 2021 18:31:08 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id w13-20020a17090aea0db029017897a5f7bcso6136118pjy.5; Thu, 19 Aug 2021 18:31:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7O2k6+cYDuSSHjxTEm0czGO0EPIJVmQZ/1JyW8Uu9YY=; b=mPAL2jB8xV1D3a2bqFDAnXEgnAB8Ny+6BQxbQiO9WNQaeYllNA5MzXgaQPUlDERJce EsS5w0k6tyyi4kwxwjvlVDw0u8Eg8NMscfGF4DH8pkaUf3Mt80ZvC8nPEN/ZKKvz/okg 7tkIeVoNWDNKECKZsyjEonbxrIcLMsWBKWaclIwncigkHR5B3KzpZZoDPPul+9FutZRn Fj9Weck7IfZEyvIedF+vMOr6IiKMe/NSQis33Y0HrpdWUF+BqskwWgejlzd6DskLsehi TTfwZ9mpDhIM+5Vz3jCB1QIKTIVEL88BP3pLyg6GQC8mXh3k/3R9/OO2NUW+jEaD2dVN e2Bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7O2k6+cYDuSSHjxTEm0czGO0EPIJVmQZ/1JyW8Uu9YY=; b=kA9m4tKRMUyf1OLOfE5TZA9mHGS0gocF2zm59Op1aFwnQhUoTt1ZKR0nySpsU6MuuO aFAa3ylYua8atfpn3x+thQELxqjJwUai5uoNPy3TNQ/Y1jleAKr1ygWph/N/TDfii/cx OrqrjXIy34emp7cU91pHwYuYtohwTVQCjszU43VbOj0wlqvIFW5zDrit6G96bovilpIg M+p7ea3G+x1FrcYQKwGpmetU33IlCdKOmsEF41u9+C//wCJsKhU3+ZUQgVjf/eLecyG6 rUkYb+Ghv/8ZtfmlKEeqg+mCxYavXr39sCiRN2cY31CLwsO3kISgLlmgwhjHaaJUGhj2 8cyw== X-Gm-Message-State: AOAM5302l5V3rBDu36aDats/ZMTKsHjPATK0VcthhRT9OpX1IjBHfs3y vDWJ7kaFcWPufH+Yt59QfO0= X-Received: by 2002:a17:90b:3805:: with SMTP id mq5mr1857519pjb.207.1629423067702; Thu, 19 Aug 2021 18:31:07 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8916:5000:d23:7118:c805:b5a5]) by smtp.gmail.com with ESMTPSA id 66sm4877950pfu.67.2021.08.19.18.30.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Aug 2021 18:31:07 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: bp@alien8.de, catalin.marinas@arm.com, dietmar.eggemann@arm.com, gregkh@linuxfoundation.org, hpa@zytor.com, juri.lelli@redhat.com, bristot@redhat.com, lenb@kernel.org, mgorman@suse.de, mingo@redhat.com, peterz@infradead.org, rjw@rjwysocki.net, sudeep.holla@arm.com, tglx@linutronix.de Cc: aubrey.li@linux.intel.com, bsegall@google.com, guodong.xu@linaro.org, jonathan.cameron@huawei.com, liguozhu@hisilicon.com, linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mark.rutland@arm.com, msys.mizuma@gmail.com, prime.zeng@hisilicon.com, rostedt@goodmis.org, tim.c.chen@linux.intel.com, valentin.schneider@arm.com, vincent.guittot@linaro.org, will@kernel.org, x86@kernel.org, xuwei5@huawei.com, yangyicong@huawei.com, linuxarm@huawei.com, Jonathan Cameron , Tian Tao , Barry Song Subject: [PATCH 1/3] topology: Represent clusters of CPUs within a die Date: Fri, 20 Aug 2021 13:30:06 +1200 Message-Id: <20210820013008.12881-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210820013008.12881-1-21cnbao@gmail.com> References: <20210820013008.12881-1-21cnbao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org From: Jonathan Cameron Both ACPI and DT provide the ability to describe additional layers of topology between that of individual cores and higher level constructs such as the level at which the last level cache is shared. In ACPI this can be represented in PPTT as a Processor Hierarchy Node Structure [1] that is the parent of the CPU cores and in turn has a parent Processor Hierarchy Nodes Structure representing a higher level of topology. For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data, but each cluster has local L3 tag. On the other hand, each clusters will share some internal system bus. +-----------------------------------+ +---------+ | +------+ +------+ +---------------------------+ | | | CPU0 | | cpu1 | | +-----------+ | | | +------+ +------+ | | | | | | +----+ L3 | | | | +------+ +------+ cluster | | tag | | | | | CPU2 | | CPU3 | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +----+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | L3 | | data | +-----------------------------------+ | | | +------+ +------+ | +-----------+ | | | | | | | | | | | | | +------+ +------+ +----+ L3 | | | | | | tag | | | | +------+ +------+ | | | | | | | | | | ++ +-----------+ | | | +------+ +------+ |---------------------------+ | +-----------------------------------| | | +-----------------------------------| | | | +------+ +------+ +---------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | +----+ L3 | | | | +------+ +------+ | | tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +---+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ ++ | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +--+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | +---------+ +-----------------------------------+ That means spreading tasks among clusters will bring more bandwidth while packing tasks within one cluster will lead to smaller cache synchronization latency. So both kernel and userspace will have a chance to leverage this topology to deploy tasks accordingly to achieve either smaller cache latency within one cluster or an even distribution of load among clusters for higher throughput. This patch exposes cluster topology to both kernel and userspace. Libraried like hwloc will know cluster by cluster_cpus and related sysfs attributes. PoC of HWLOC support at [2]. Note this patch only handle the ACPI case. Special consideration is needed for SMT processors, where it is necessary to move 2 levels up the hierarchy from the leaf nodes (thus skipping the processor core level). Note that arm64 / ACPI does not provide any means of identifying a die level in the topology but that may be unrelate to the cluster level. [1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node structure (Type 0) [2] https://github.com/hisilicon/hwloc/tree/linux-cluster Signed-off-by: Jonathan Cameron Signed-off-by: Tian Tao Signed-off-by: Barry Song --- .../ABI/stable/sysfs-devices-system-cpu | 15 +++++ Documentation/admin-guide/cputopology.rst | 12 ++-- arch/arm64/kernel/topology.c | 2 + drivers/acpi/pptt.c | 67 +++++++++++++++++++ drivers/base/arch_topology.c | 14 ++++ drivers/base/topology.c | 10 +++ include/linux/acpi.h | 5 ++ include/linux/arch_topology.h | 5 ++ include/linux/topology.h | 6 ++ 9 files changed, 132 insertions(+), 4 deletions(-) -- 2.25.1 diff --git a/Documentation/ABI/stable/sysfs-devices-system-cpu b/Documentation/ABI/stable/sysfs-devices-system-cpu index 516dafea03eb..3965ce504484 100644 --- a/Documentation/ABI/stable/sysfs-devices-system-cpu +++ b/Documentation/ABI/stable/sysfs-devices-system-cpu @@ -42,6 +42,12 @@ Description: the CPU core ID of cpuX. Typically it is the hardware platform's architecture and platform dependent. Values: integer +What: /sys/devices/system/cpu/cpuX/topology/cluster_id +Description: the cluster ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. +Values: integer + What: /sys/devices/system/cpu/cpuX/topology/book_id Description: the book ID of cpuX. Typically it is the hardware platform's identifier (rather than the kernel's). The actual value is @@ -85,6 +91,15 @@ Description: human-readable list of CPUs within the same die. The format is like 0-3, 8-11, 14,17. Values: decimal list. +What: /sys/devices/system/cpu/cpuX/topology/cluster_cpus +Description: internal kernel map of CPUs within the same cluster. +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/cluster_cpus_list +Description: human-readable list of CPUs within the same cluster. + The format is like 0-3, 8-11, 14,17. +Values: decimal list. + What: /sys/devices/system/cpu/cpuX/topology/book_siblings Description: internal kernel map of cpuX's hardware threads within the same book_id. it's only used on s390. diff --git a/Documentation/admin-guide/cputopology.rst b/Documentation/admin-guide/cputopology.rst index 8632a1db36e4..a5491949880d 100644 --- a/Documentation/admin-guide/cputopology.rst +++ b/Documentation/admin-guide/cputopology.rst @@ -19,11 +19,13 @@ these macros in include/asm-XXX/topology.h:: #define topology_physical_package_id(cpu) #define topology_die_id(cpu) + #define topology_cluster_id(cpu) #define topology_core_id(cpu) #define topology_book_id(cpu) #define topology_drawer_id(cpu) #define topology_sibling_cpumask(cpu) #define topology_core_cpumask(cpu) + #define topology_cluster_cpumask(cpu) #define topology_die_cpumask(cpu) #define topology_book_cpumask(cpu) #define topology_drawer_cpumask(cpu) @@ -39,10 +41,12 @@ not defined by include/asm-XXX/topology.h: 1) topology_physical_package_id: -1 2) topology_die_id: -1 -3) topology_core_id: 0 -4) topology_sibling_cpumask: just the given CPU -5) topology_core_cpumask: just the given CPU -6) topology_die_cpumask: just the given CPU +3) topology_cluster_id: -1 +4) topology_core_id: 0 +5) topology_sibling_cpumask: just the given CPU +6) topology_core_cpumask: just the given CPU +7) topology_cluster_cpumask: just the given CPU +8) topology_die_cpumask: just the given CPU For architectures that don't support books (CONFIG_SCHED_BOOK) there are no default definitions for topology_book_id() and topology_book_cpumask(). diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 4dd14a6620c1..9ab78ad826e2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -103,6 +103,8 @@ int __init parse_acpi_topology(void) cpu_topology[cpu].thread_id = -1; cpu_topology[cpu].core_id = topology_id; } + topology_id = find_acpi_cpu_topology_cluster(cpu); + cpu_topology[cpu].cluster_id = topology_id; topology_id = find_acpi_cpu_topology_package(cpu); cpu_topology[cpu].package_id = topology_id; diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c index fe69dc518f31..701f61c01359 100644 --- a/drivers/acpi/pptt.c +++ b/drivers/acpi/pptt.c @@ -746,6 +746,73 @@ int find_acpi_cpu_topology_package(unsigned int cpu) ACPI_PPTT_PHYSICAL_PACKAGE); } +/** + * find_acpi_cpu_topology_cluster() - Determine a unique CPU cluster value + * @cpu: Kernel logical CPU number + * + * Determine a topology unique cluster ID for the given CPU/thread. + * This ID can then be used to group peers, which will have matching ids. + * + * The cluster, if present is the level of topology above CPUs. In a + * multi-thread CPU, it will be the level above the CPU, not the thread. + * It may not exist in single CPU systems. In simple multi-CPU systems, + * it may be equal to the package topology level. + * + * Return: -ENOENT if the PPTT doesn't exist, the CPU cannot be found + * or there is no toplogy level above the CPU.. + * Otherwise returns a value which represents the package for this CPU. + */ + +int find_acpi_cpu_topology_cluster(unsigned int cpu) +{ + struct acpi_table_header *table; + acpi_status status; + struct acpi_pptt_processor *cpu_node, *cluster_node; + u32 acpi_cpu_id; + int retval; + int is_thread; + + status = acpi_get_table(ACPI_SIG_PPTT, 0, &table); + if (ACPI_FAILURE(status)) { + acpi_pptt_warn_missing(); + return -ENOENT; + } + + acpi_cpu_id = get_acpi_id_for_cpu(cpu); + cpu_node = acpi_find_processor_node(table, acpi_cpu_id); + if (cpu_node == NULL || !cpu_node->parent) { + retval = -ENOENT; + goto put_table; + } + + is_thread = cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD; + cluster_node = fetch_pptt_node(table, cpu_node->parent); + if (cluster_node == NULL) { + retval = -ENOENT; + goto put_table; + } + if (is_thread) { + if (!cluster_node->parent) { + retval = -ENOENT; + goto put_table; + } + cluster_node = fetch_pptt_node(table, cluster_node->parent); + if (cluster_node == NULL) { + retval = -ENOENT; + goto put_table; + } + } + if (cluster_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) + retval = cluster_node->acpi_processor_id; + else + retval = ACPI_PTR_DIFF(cluster_node, table); + +put_table: + acpi_put_table(table); + + return retval; +} + /** * find_acpi_cpu_topology_hetero_id() - Get a core architecture tag * @cpu: Kernel logical CPU number diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 921312a8d957..5b1589adacaf 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -598,6 +598,11 @@ const struct cpumask *cpu_coregroup_mask(int cpu) return core_mask; } +const struct cpumask *cpu_clustergroup_mask(int cpu) +{ + return &cpu_topology[cpu].cluster_sibling; +} + void update_siblings_masks(unsigned int cpuid) { struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid]; @@ -615,6 +620,11 @@ void update_siblings_masks(unsigned int cpuid) if (cpuid_topo->package_id != cpu_topo->package_id) continue; + if (cpuid_topo->cluster_id == cpu_topo->cluster_id) { + cpumask_set_cpu(cpu, &cpuid_topo->cluster_sibling); + cpumask_set_cpu(cpuid, &cpu_topo->cluster_sibling); + } + cpumask_set_cpu(cpuid, &cpu_topo->core_sibling); cpumask_set_cpu(cpu, &cpuid_topo->core_sibling); @@ -633,6 +643,9 @@ static void clear_cpu_topology(int cpu) cpumask_clear(&cpu_topo->llc_sibling); cpumask_set_cpu(cpu, &cpu_topo->llc_sibling); + cpumask_clear(&cpu_topo->cluster_sibling); + cpumask_set_cpu(cpu, &cpu_topo->cluster_sibling); + cpumask_clear(&cpu_topo->core_sibling); cpumask_set_cpu(cpu, &cpu_topo->core_sibling); cpumask_clear(&cpu_topo->thread_sibling); @@ -648,6 +661,7 @@ void __init reset_cpu_topology(void) cpu_topo->thread_id = -1; cpu_topo->core_id = -1; + cpu_topo->cluster_id = -1; cpu_topo->package_id = -1; cpu_topo->llc_id = -1; diff --git a/drivers/base/topology.c b/drivers/base/topology.c index 43c0940643f5..8f2b641d0b8c 100644 --- a/drivers/base/topology.c +++ b/drivers/base/topology.c @@ -48,6 +48,9 @@ static DEVICE_ATTR_RO(physical_package_id); define_id_show_func(die_id); static DEVICE_ATTR_RO(die_id); +define_id_show_func(cluster_id); +static DEVICE_ATTR_RO(cluster_id); + define_id_show_func(core_id); static DEVICE_ATTR_RO(core_id); @@ -63,6 +66,10 @@ define_siblings_read_func(core_siblings, core_cpumask); static BIN_ATTR_RO(core_siblings, 0); static BIN_ATTR_RO(core_siblings_list, 0); +define_siblings_read_func(cluster_cpus, cluster_cpumask); +static BIN_ATTR_RO(cluster_cpus, 0); +static BIN_ATTR_RO(cluster_cpus_list, 0); + define_siblings_read_func(die_cpus, die_cpumask); static BIN_ATTR_RO(die_cpus, 0); static BIN_ATTR_RO(die_cpus_list, 0); @@ -94,6 +101,8 @@ static struct bin_attribute *bin_attrs[] = { &bin_attr_thread_siblings_list, &bin_attr_core_siblings, &bin_attr_core_siblings_list, + &bin_attr_cluster_cpus, + &bin_attr_cluster_cpus_list, &bin_attr_die_cpus, &bin_attr_die_cpus_list, &bin_attr_package_cpus, @@ -112,6 +121,7 @@ static struct bin_attribute *bin_attrs[] = { static struct attribute *default_attrs[] = { &dev_attr_physical_package_id.attr, &dev_attr_die_id.attr, + &dev_attr_cluster_id.attr, &dev_attr_core_id.attr, #ifdef CONFIG_SCHED_BOOK &dev_attr_book_id.attr, diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 72e4f7fd268c..6d65427e5f67 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -1353,6 +1353,7 @@ static inline int lpit_read_residency_count_address(u64 *address) #ifdef CONFIG_ACPI_PPTT int acpi_pptt_cpu_is_thread(unsigned int cpu); int find_acpi_cpu_topology(unsigned int cpu, int level); +int find_acpi_cpu_topology_cluster(unsigned int cpu); int find_acpi_cpu_topology_package(unsigned int cpu); int find_acpi_cpu_topology_hetero_id(unsigned int cpu); int find_acpi_cpu_cache_topology(unsigned int cpu, int level); @@ -1365,6 +1366,10 @@ static inline int find_acpi_cpu_topology(unsigned int cpu, int level) { return -EINVAL; } +static inline int find_acpi_cpu_topology_cluster(unsigned int cpu) +{ + return -EINVAL; +} static inline int find_acpi_cpu_topology_package(unsigned int cpu) { return -EINVAL; diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index f180240dc95f..b97cea83b25e 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -62,10 +62,12 @@ void topology_set_thermal_pressure(const struct cpumask *cpus, struct cpu_topology { int thread_id; int core_id; + int cluster_id; int package_id; int llc_id; cpumask_t thread_sibling; cpumask_t core_sibling; + cpumask_t cluster_sibling; cpumask_t llc_sibling; }; @@ -73,13 +75,16 @@ struct cpu_topology { extern struct cpu_topology cpu_topology[NR_CPUS]; #define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id) +#define topology_cluster_id(cpu) (cpu_topology[cpu].cluster_id) #define topology_core_id(cpu) (cpu_topology[cpu].core_id) #define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling) #define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling) +#define topology_cluster_cpumask(cpu) (&cpu_topology[cpu].cluster_sibling) #define topology_llc_cpumask(cpu) (&cpu_topology[cpu].llc_sibling) void init_cpu_topology(void); void store_cpu_topology(unsigned int cpuid); const struct cpumask *cpu_coregroup_mask(int cpu); +const struct cpumask *cpu_clustergroup_mask(int cpu); void update_siblings_masks(unsigned int cpu); void remove_cpu_topology(unsigned int cpuid); void reset_cpu_topology(void); diff --git a/include/linux/topology.h b/include/linux/topology.h index 7634cd737061..80d27d717631 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -186,6 +186,9 @@ static inline int cpu_to_mem(int cpu) #ifndef topology_die_id #define topology_die_id(cpu) ((void)(cpu), -1) #endif +#ifndef topology_cluster_id +#define topology_cluster_id(cpu) ((void)(cpu), -1) +#endif #ifndef topology_core_id #define topology_core_id(cpu) ((void)(cpu), 0) #endif @@ -195,6 +198,9 @@ static inline int cpu_to_mem(int cpu) #ifndef topology_core_cpumask #define topology_core_cpumask(cpu) cpumask_of(cpu) #endif +#ifndef topology_cluster_cpumask +#define topology_cluster_cpumask(cpu) cpumask_of(cpu) +#endif #ifndef topology_die_cpumask #define topology_die_cpumask(cpu) cpumask_of(cpu) #endif From patchwork Fri Aug 20 01:30:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 500443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F31B5C4338F for ; Fri, 20 Aug 2021 01:31:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D39596112D for ; Fri, 20 Aug 2021 01:31:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234709AbhHTBb7 (ORCPT ); Thu, 19 Aug 2021 21:31:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234654AbhHTBb6 (ORCPT ); Thu, 19 Aug 2021 21:31:58 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50CF5C061575; Thu, 19 Aug 2021 18:31:21 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id c4so5030568plh.7; Thu, 19 Aug 2021 18:31:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Y/YLCtZuePBllw/Zn57jUmVpm/kBwVZsI4ug8E6fHEc=; b=ZPxHakUkPM8rDynwTYFDsZMAjJLElGaqdGN2YpBWnB/JSrP3LyfYG7BGWmXFW+D2// ZaI+oKs3jA2JP2coRwBePOhzGgbZFmorZeHIHwzaqOvlRxKxP6JMcFf/HwuuXAHJoN99 vI87vYw/DEz/QW6yo+Sn7G0FrfyBdYSB3GabtLoQA875ve/LEx6BqZ6Tec4u33kXzPOr 3+B19gKdVduTCuMFFshgU2pr+l7SZVlLww0g/RpmJJEsxBHHYainB1xFPcxYE0rnyJDr D5d/MZIxmKeX65OoIEbkBQqiCpwTUuQDtNCBLmCnh3yPkij1/jzTi5hatp34ouR/hERJ K2Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Y/YLCtZuePBllw/Zn57jUmVpm/kBwVZsI4ug8E6fHEc=; b=YqHhcQx+d+DE5BNWUAuhC6qbqQfVn7+JWdI50r7UspUdgNmQYsIKVgCw+OtbWlQ/Fk VbyVksSrB169Hsa5Oru6wJiEaZJTYpqzG2+EhE/teAd03qYkJthDdc9cysjvSCsLZFqy Z51b6kIsGavEsusHIMsWYLtvsrCCfX3Wwm9AifzxCajkhoat9UZxLATVTnErBbpzVWju jX26KOfA3jRY+ib2TtHjxTIKZ6sqTJ/zQki08hY0Dgnf7eXItgEw4ewjeQKYMOh1BFMl hi4+T0UGjFjLMCZg1+ES1fbysApmT0qxxCSkD8dTr1wu7qZRDmTsXt90aiKCsXhUl2mB B0LA== X-Gm-Message-State: AOAM532iPnH0CiTZ0vQKQaJecC8DJ+XIzZ4gg+OONj9cSoCazoLAk0RV uHuPCBRykBbS+lFptiIgyIc= X-Google-Smtp-Source: ABdhPJwMz0oSiOq8GXMZCOfvuYBNDZrc8o1nOqberFkN1RdmqXUnYmuWRAiD8qgprbkRRhozTR+iQA== X-Received: by 2002:a17:90a:b303:: with SMTP id d3mr1834235pjr.199.1629423080845; Thu, 19 Aug 2021 18:31:20 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8916:5000:d23:7118:c805:b5a5]) by smtp.gmail.com with ESMTPSA id 66sm4877950pfu.67.2021.08.19.18.31.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Aug 2021 18:31:20 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: bp@alien8.de, catalin.marinas@arm.com, dietmar.eggemann@arm.com, gregkh@linuxfoundation.org, hpa@zytor.com, juri.lelli@redhat.com, bristot@redhat.com, lenb@kernel.org, mgorman@suse.de, mingo@redhat.com, peterz@infradead.org, rjw@rjwysocki.net, sudeep.holla@arm.com, tglx@linutronix.de Cc: aubrey.li@linux.intel.com, bsegall@google.com, guodong.xu@linaro.org, jonathan.cameron@huawei.com, liguozhu@hisilicon.com, linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mark.rutland@arm.com, msys.mizuma@gmail.com, prime.zeng@hisilicon.com, rostedt@goodmis.org, tim.c.chen@linux.intel.com, valentin.schneider@arm.com, vincent.guittot@linaro.org, will@kernel.org, x86@kernel.org, xuwei5@huawei.com, yangyicong@huawei.com, linuxarm@huawei.com, Barry Song , Yicong Yang Subject: [PATCH 2/3] scheduler: Add cluster scheduler level in core and related Kconfig for ARM64 Date: Fri, 20 Aug 2021 13:30:07 +1200 Message-Id: <20210820013008.12881-3-21cnbao@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210820013008.12881-1-21cnbao@gmail.com> References: <20210820013008.12881-1-21cnbao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org From: Barry Song This patch adds scheduler level for clusters and automatically enables the load balance among clusters. It will directly benefit a lot of workload which loves more resources such as memory bandwidth, caches. Testing has widely been done in two different hardware configurations of Kunpeng920: 24 cores in one NUMA(6 clusters in each NUMA node); 32 cores in one NUMA(8 clusters in each NUMA node) Workload is running on either one NUMA node or four NUMA nodes, thus, this can estimate the effect of cluster spreading w/ and w/o NUMA load balance. * Stream benchmark: 4threads stream (on 1NUMA * 24cores = 24cores) stream stream w/o patch w/ patch MB/sec copy 29929.64 ( 0.00%) 32932.68 ( 10.03%) MB/sec scale 29861.10 ( 0.00%) 32710.58 ( 9.54%) MB/sec add 27034.42 ( 0.00%) 32400.68 ( 19.85%) MB/sec triad 27225.26 ( 0.00%) 31965.36 ( 17.41%) 6threads stream (on 1NUMA * 24cores = 24cores) stream stream w/o patch w/ patch MB/sec copy 40330.24 ( 0.00%) 42377.68 ( 5.08%) MB/sec scale 40196.42 ( 0.00%) 42197.90 ( 4.98%) MB/sec add 37427.00 ( 0.00%) 41960.78 ( 12.11%) MB/sec triad 37841.36 ( 0.00%) 42513.64 ( 12.35%) 12threads stream (on 1NUMA * 24cores = 24cores) stream stream w/o patch w/ patch MB/sec copy 52639.82 ( 0.00%) 53818.04 ( 2.24%) MB/sec scale 52350.30 ( 0.00%) 53253.38 ( 1.73%) MB/sec add 53607.68 ( 0.00%) 55198.82 ( 2.97%) MB/sec triad 54776.66 ( 0.00%) 56360.40 ( 2.89%) Thus, it could help memory-bound workload especially under medium load. Similar improvement is also seen in lkp-pbzip2: * lkp-pbzip2 benchmark 2-96 threads (on 4NUMA * 24cores = 96cores) lkp-pbzip2 lkp-pbzip2 w/o patch w/ patch Hmean tput-2 11062841.57 ( 0.00%) 11341817.51 * 2.52%* Hmean tput-5 26815503.70 ( 0.00%) 27412872.65 * 2.23%* Hmean tput-8 41873782.21 ( 0.00%) 43326212.92 * 3.47%* Hmean tput-12 61875980.48 ( 0.00%) 64578337.51 * 4.37%* Hmean tput-21 105814963.07 ( 0.00%) 111381851.01 * 5.26%* Hmean tput-30 150349470.98 ( 0.00%) 156507070.73 * 4.10%* Hmean tput-48 237195937.69 ( 0.00%) 242353597.17 * 2.17%* Hmean tput-79 360252509.37 ( 0.00%) 362635169.23 * 0.66%* Hmean tput-96 394571737.90 ( 0.00%) 400952978.48 * 1.62%* 2-24 threads (on 1NUMA * 24cores = 24cores) lkp-pbzip2 lkp-pbzip2 w/o patch w/ patch Hmean tput-2 11071705.49 ( 0.00%) 11296869.10 * 2.03%* Hmean tput-4 20782165.19 ( 0.00%) 21949232.15 * 5.62%* Hmean tput-6 30489565.14 ( 0.00%) 33023026.96 * 8.31%* Hmean tput-8 40376495.80 ( 0.00%) 42779286.27 * 5.95%* Hmean tput-12 61264033.85 ( 0.00%) 62995632.78 * 2.83%* Hmean tput-18 86697139.39 ( 0.00%) 86461545.74 ( -0.27%) Hmean tput-24 104854637.04 ( 0.00%) 104522649.46 * -0.32%* In the case of 6 threads and 8 threads, we see the greatest performance improvement. Similar improvement can be seen on lkp-pixz though the improvement is smaller: * lkp-pixz benchmark 2-24 threads lkp-pixz (on 1NUMA * 24cores = 24cores) lkp-pixz lkp-pixz w/o patch w/ patch Hmean tput-2 6486981.16 ( 0.00%) 6561515.98 * 1.15%* Hmean tput-4 11645766.38 ( 0.00%) 11614628.43 ( -0.27%) Hmean tput-6 15429943.96 ( 0.00%) 15957350.76 * 3.42%* Hmean tput-8 19974087.63 ( 0.00%) 20413746.98 * 2.20%* Hmean tput-12 28172068.18 ( 0.00%) 28751997.06 * 2.06%* Hmean tput-18 39413409.54 ( 0.00%) 39896830.55 * 1.23%* Hmean tput-24 49101815.85 ( 0.00%) 49418141.47 * 0.64%* * SPECrate benchmark 4,8,16 copies mcf_r(on 1NUMA * 32cores = 32cores) Base Base Run Time Rate ------- --------- 4 Copies w/o 580 (w/ 570) w/o 11.1 (w/ 11.3) 8 Copies w/o 647 (w/ 605) w/o 20.0 (w/ 21.4, +7%) 16 Copies w/o 844 (w/ 844) w/o 30.6 (w/ 30.6) 32 Copies(on 4NUMA * 32 cores = 128cores) [w/o patch] Base Base Base Benchmarks Copies Run Time Rate --------------- ------- --------- --------- 500.perlbench_r 32 584 87.2 * 502.gcc_r 32 503 90.2 * 505.mcf_r 32 745 69.4 * 520.omnetpp_r 32 1031 40.7 * 523.xalancbmk_r 32 597 56.6 * 525.x264_r 1 -- CE 531.deepsjeng_r 32 336 109 * 541.leela_r 32 556 95.4 * 548.exchange2_r 32 513 163 * 557.xz_r 32 530 65.2 * Est. SPECrate2017_int_base 80.3 [w/ patch] Base Base Base Benchmarks Copies Run Time Rate --------------- ------- --------- --------- 500.perlbench_r 32 580 87.8 (+0.688%) * 502.gcc_r 32 477 95.1 (+5.432%) * 505.mcf_r 32 644 80.3 (+13.574%) * 520.omnetpp_r 32 942 44.6 (+9.58%) * 523.xalancbmk_r 32 560 60.4 (+6.714%%) * 525.x264_r 1 -- CE 531.deepsjeng_r 32 337 109 (+0.000%) * 541.leela_r 32 554 95.6 (+0.210%) * 548.exchange2_r 32 515 163 (+0.000%) * 557.xz_r 32 524 66.0 (+1.227%) * Est. SPECrate2017_int_base 83.7 (+4.062%) On the other hand, it is slightly helpful to CPU-bound tasks like kernbench: * 24-96 threads kernbench (on 4NUMA * 24cores = 96cores) kernbench kernbench w/o cluster w/ cluster Min user-24 12054.67 ( 0.00%) 12024.19 ( 0.25%) Min syst-24 1751.51 ( 0.00%) 1731.68 ( 1.13%) Min elsp-24 600.46 ( 0.00%) 598.64 ( 0.30%) Min user-48 12361.93 ( 0.00%) 12315.32 ( 0.38%) Min syst-48 1917.66 ( 0.00%) 1892.73 ( 1.30%) Min elsp-48 333.96 ( 0.00%) 332.57 ( 0.42%) Min user-96 12922.40 ( 0.00%) 12921.17 ( 0.01%) Min syst-96 2143.94 ( 0.00%) 2110.39 ( 1.56%) Min elsp-96 211.22 ( 0.00%) 210.47 ( 0.36%) Amean user-24 12063.99 ( 0.00%) 12030.78 * 0.28%* Amean syst-24 1755.20 ( 0.00%) 1735.53 * 1.12%* Amean elsp-24 601.60 ( 0.00%) 600.19 ( 0.23%) Amean user-48 12362.62 ( 0.00%) 12315.56 * 0.38%* Amean syst-48 1921.59 ( 0.00%) 1894.95 * 1.39%* Amean elsp-48 334.10 ( 0.00%) 332.82 * 0.38%* Amean user-96 12925.27 ( 0.00%) 12922.63 ( 0.02%) Amean syst-96 2146.66 ( 0.00%) 2122.20 * 1.14%* Amean elsp-96 211.96 ( 0.00%) 211.79 ( 0.08%) Note this patch isn't an universal win, it might hurt those workload which can benefit from packing. Though tasks which want to take advantages of lower communication latency of one cluster won't necessarily been packed in one cluster while kernel is not aware of clusters, they have some chance to been randomly packed. But this patch will make them spread anyway. Signed-off-by: Yicong Yang Signed-off-by: Barry Song --- arch/arm64/Kconfig | 7 +++++++ include/linux/sched/topology.h | 7 +++++++ include/linux/topology.h | 7 +++++++ kernel/sched/topology.c | 5 +++++ 4 files changed, 26 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fdcd54d39c1e..7a3cc2314a03 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -992,6 +992,13 @@ config SCHED_MC making when dealing with multi-core CPU chips at a cost of slightly increased overhead in some places. If unsure say N here. +config SCHED_CLUSTER + bool "Cluster scheduler support" + help + Cluster scheduler support improves the CPU scheduler's decision + making when dealing with machines that have clusters(sharing internal + bus or sharing LLC cache tag). If unsure say N here. + config SCHED_SMT bool "SMT scheduler support" help diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 8f0f778b7c91..2f9166f6dec8 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -42,6 +42,13 @@ static inline int cpu_smt_flags(void) } #endif +#ifdef CONFIG_SCHED_CLUSTER +static inline int cpu_cluster_flags(void) +{ + return SD_SHARE_PKG_RESOURCES; +} +#endif + #ifdef CONFIG_SCHED_MC static inline int cpu_core_flags(void) { diff --git a/include/linux/topology.h b/include/linux/topology.h index 80d27d717631..0b3704ad13c8 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -212,6 +212,13 @@ static inline const struct cpumask *cpu_smt_mask(int cpu) } #endif +#if defined(CONFIG_SCHED_CLUSTER) && !defined(cpu_cluster_mask) +static inline const struct cpumask *cpu_cluster_mask(int cpu) +{ + return topology_cluster_cpumask(cpu); +} +#endif + static inline const struct cpumask *cpu_cpu_mask(int cpu) { return cpumask_of_node(cpu_to_node(cpu)); diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b77ad49dc14f..546cfb1c728e 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1625,6 +1625,11 @@ static struct sched_domain_topology_level default_topology[] = { #ifdef CONFIG_SCHED_SMT { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) }, #endif + +#ifdef CONFIG_SCHED_CLUSTER + { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) }, +#endif + #ifdef CONFIG_SCHED_MC { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) }, #endif