From patchwork Thu Feb 8 13:08:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 770849 Delivered-To: patch@linaro.org Received: by 2002:adf:9dca:0:b0:33b:4db1:f5b3 with SMTP id q10csp310876wre; Thu, 8 Feb 2024 05:09:05 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUcOiXMpJi5KaIBRjlk4B/lBx0QMbZqCzTSRLIQlXeg0DVuqiCnP+UgxO1RoiUIi5HRiS2QvLPVExuM0FQ/ollw X-Google-Smtp-Source: AGHT+IGi9JVYE53ecO+JAfwtblNOv4QfM8JVdQnzNAt6pa2wg6mM+s6f6I/BM7RysKvnuYMsNmMm X-Received: by 2002:a05:6214:2aa5:b0:68c:c0bd:9bd1 with SMTP id js5-20020a0562142aa500b0068cc0bd9bd1mr4445787qvb.16.1707397745505; Thu, 08 Feb 2024 05:09:05 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707397745; cv=pass; d=google.com; s=arc-20160816; b=zJNX9/xILDJ0F4Zu4RSKS4p4zFdT9hdopAQnY4FXehjBHGtmNFzJvu8MuH25mjn9GV gqwajr9nDxh3y8jxzlnFK1T34iHn+VJIcIUJC26Qvx4ROr5V7S3ArASB5IEco3m4DXX8 ekCE3h+tMCBIwBv8P8Cdo82oCDpJgAkOJ30BLkcwYkWRuOLRXgpPQjaslpprMcKtYvl8 eabOcxkq6KtrAmV/oA4oyxzjoPoygtDLaLj2mSk66gkKRsUQoJcZXJDYmYNni0HGg66R rDKgu/lzUEMuhUWrmuJWRB2n3aBYYmgYiO0P6CNG+5aOaAEjvPp9nVohjXDNtjcD23LV 9xJw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=97QAG9z8C8w+cwX6JnQpqmEqLWnpiF7zMMcL+XGPH/Y=; fh=1g06m2GWgw1suEHdC/yX5vrQgz+DkQm1bs6AWXrA5q4=; b=OCN/k5w6ebAbjgY7IXBEiqdCxG/VLFqOg46lQrquE9rJngmYZ3MT8bIs9XAw+lfcfQ GKgkMUMeoymSK3BukyFJlrhFBKFuDU+LwI2DxQjTekXUPXDhw9rJ4iJybD77n+qBYi15 XO3LWf9TMauO0n0qu8KB2bDTkcxjcbgj7Ic+CNs6Ndqrfn4CJWdOvx/jHhW9lTYijHxO 7y0HiCrGFqgtFjR0oSqdGAKoYzP2Ek8a9HtWUCqNETk0EwmqEoVIkc15XfCfIc9C4TrJ DIRVoVxLH3kBejCNY9ysoodK2Om6DL3iIC0/sTDoptLKKCOmtUDXhuSv4jF9bhWWjes6 WhOw==; darn=linaro.org ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jNnLtGJ7; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org X-Forwarded-Encrypted: i=2; AJvYcCWcr8ttmGxT9jL6VWVNgKiNpzbcgcJ3rAN3VPPQGNpIdgGDP2+fp71WTprgBf5OmG8fJR6gMPH5kM1CkWJaVMI6 Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id iu7-20020ad45cc7000000b0068cb94a6c91si3482507qvb.87.2024.02.08.05.09.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:09:05 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jNnLtGJ7; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1F68D38582B9 for ; Thu, 8 Feb 2024 13:09:05 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id 10EDD3858C35 for ; Thu, 8 Feb 2024 13:08:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 10EDD3858C35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 10EDD3858C35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::633 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397733; cv=none; b=T1omClpwBnvM1BdWXh+tWxutC4X+vgOcAaRNJUuy6u4m4V5GtInfB2jTSJ8IkRBCMRRfgHTSn5cAzwlmchbia9t6CRkgKJ+xvD++emjdD06Ut+RQ1bRGH0yJflZrik0h6PldnXxYe/6yCcZiLiYADQSkgia5v1IXMYH5x/3C35U= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397733; c=relaxed/simple; bh=xnJ+tDWAjOBmDgnMtZrrEqUVvVc1F2wm9LR8wmhqXdU=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=Edztt+KEomWTfaVOwc9hvMgl6COC/cPeX4r4Vn2Ux/MfTmq6rI7uc2SONB2/a2o/thRZ60nXRbcRcWrYc1QIZUuv4Mzj4g620s4Cgz2L9WG/tTpzS2ZNzlrz2dCMJ6ie3bPXpCJ4Z4YcW+OYi4iA/S44GfzkS2soHwtwu5wXSo4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1d70b0e521eso13963365ad.1 for ; Thu, 08 Feb 2024 05:08:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707397728; x=1708002528; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=97QAG9z8C8w+cwX6JnQpqmEqLWnpiF7zMMcL+XGPH/Y=; b=jNnLtGJ7XnTvbbyLbqL3j3PkUR9jYDohQZ7pMoIzK8J3fhQhR3u4ddO9aOQKJ/BtDM QHyFp39CMiy3aCzCWffVyREuHm1ciop4oD6v9KhrXb4TyVW3HdA3p/lyOxOGt/y/2Ihc U4DKcyRSKm5TmW88BLXwgKIzwJAZhTr8ORbn+fZseHV4ByrCnm4Ww7Peua9WwqmnRpIr jpFIzKnnHcWR86FxJzgR5llY7pCGAJBLl5DPgUgNU+EE6Bu8NbqjTp8AnkmnFjmpGF5n 1miqvOYQK/lqoJNXQis6fVcS7PijaHlfSCzerfzS3arOV1vxFD+6MUcQVKUnJJeSpn6Y GW/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707397728; x=1708002528; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=97QAG9z8C8w+cwX6JnQpqmEqLWnpiF7zMMcL+XGPH/Y=; b=rvDZ9lqqIXh4hLysTQF7i2dCCMNi6TzYPUukjtU9uvFaIoXGfbeUoUXEiQkAqw7pIH 048OfhXNt6px22ORjQCqGMx6o8ge6oaGZcqgMcfxcVPX8yqhhO0vDlxezPyYke/ja6QX 3iP3ds4cIlEuQDwOz9aaW8XSt9yrNUN4i/QOuIkFLO91o04zHVP5i+W0QQtA2+5XLQlD W35/n5SUCCPhcj8tCQtKGVt9NzsfTLNxEPq+WDVc4ARpVG8FmDWgYA6dYq46P063DPnE Y+Hebym/9StSwYQyqRwa2PCCD2Vso83N4DucRlWc6GD5XSOR01tTCdNG+Vzan2ulwKaq wgvA== X-Gm-Message-State: AOJu0YzkShnWYF2qqzNvvtUBDejUd4MPTeGkMUaHvHlI4TpE4zGtoszw T106bioma9HvDsUDmfIdHtxpOt6YyawMtdMQnjwANYM4W468sfWIfeZf2TLXvD4c4gHnhR5EtIS O X-Received: by 2002:a17:902:cec1:b0:1d9:90d6:bed3 with SMTP id d1-20020a170902cec100b001d990d6bed3mr9365887plg.43.1707397728419; Thu, 08 Feb 2024 05:08:48 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUjoLx9ztK2D+zu35uCKav2hRKaOW1qlKifOYqF9AwPwPoGhThDzn/K1XfKcczIH84/I8JVzAHBe7xuFl2xbuBu+gagOGoqgKGHzPNuDZmWDyS6FXF9Scv80ipfWIFLbeYvSxCPeFh8XXo1EBfZucUCcdsQd5bX1ImlBw65kseOjbdyyhp7Nj5tOA== Received: from mandiga.. ([2804:1b3:a7c0:378:6793:1dc3:1346:d6d6]) by smtp.gmail.com with ESMTPSA id 4-20020a170902c14400b001d9fc535378sm1844083plj.236.2024.02.08.05.08.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:08:47 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v3 1/3] x86: Fix Zen3/Zen4 ERMS selection (BZ 30994) Date: Thu, 8 Feb 2024 10:08:38 -0300 Message-Id: <20240208130840.533348-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240208130840.533348-1-adhemerval.zanella@linaro.org> References: <20240208130840.533348-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org The REP MOVSB usage on memcpy/memmove does not show much performance improvement on Zen3/Zen4 cores compared to the vectorized loops. Also, as from BZ 30994, if the source is aligned and the destination is not the performance can be 20x slower. The performance difference is noticeable with small buffer sizes, closer to the lower bounds limits when memcpy/memmove starts to use ERMS. The performance of REP MOVSB is similar to vectorized instruction on the size limit (the L2 cache). Also, there is no drawback to multiple cores sharing the cache. Checked on x86_64-linux-gnu on Zen3. Reviewed-by: H.J. Lu --- sysdeps/x86/dl-cacheinfo.h | 38 ++++++++++++++++++-------------------- 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index d5101615e3..f34d12846c 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -791,7 +791,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) long int data = -1; long int shared = -1; long int shared_per_thread = -1; - long int core = -1; unsigned int threads = 0; unsigned long int level1_icache_size = -1; unsigned long int level1_icache_linesize = -1; @@ -809,7 +808,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (cpu_features->basic.kind == arch_kind_intel) { data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features); - core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features); shared_per_thread = shared; @@ -822,7 +820,8 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) = handle_intel (_SC_LEVEL1_DCACHE_ASSOC, cpu_features); level1_dcache_linesize = handle_intel (_SC_LEVEL1_DCACHE_LINESIZE, cpu_features); - level2_cache_size = core; + level2_cache_size + = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); level2_cache_assoc = handle_intel (_SC_LEVEL2_CACHE_ASSOC, cpu_features); level2_cache_linesize @@ -835,12 +834,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level4_cache_size = handle_intel (_SC_LEVEL4_CACHE_SIZE, cpu_features); - get_common_cache_info (&shared, &shared_per_thread, &threads, core); + get_common_cache_info (&shared, &shared_per_thread, &threads, + level2_cache_size); } else if (cpu_features->basic.kind == arch_kind_zhaoxin) { data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE); - core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE); shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE); shared_per_thread = shared; @@ -849,19 +848,19 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level1_dcache_size = data; level1_dcache_assoc = handle_zhaoxin (_SC_LEVEL1_DCACHE_ASSOC); level1_dcache_linesize = handle_zhaoxin (_SC_LEVEL1_DCACHE_LINESIZE); - level2_cache_size = core; + level2_cache_size = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE); level2_cache_assoc = handle_zhaoxin (_SC_LEVEL2_CACHE_ASSOC); level2_cache_linesize = handle_zhaoxin (_SC_LEVEL2_CACHE_LINESIZE); level3_cache_size = shared; level3_cache_assoc = handle_zhaoxin (_SC_LEVEL3_CACHE_ASSOC); level3_cache_linesize = handle_zhaoxin (_SC_LEVEL3_CACHE_LINESIZE); - get_common_cache_info (&shared, &shared_per_thread, &threads, core); + get_common_cache_info (&shared, &shared_per_thread, &threads, + level2_cache_size); } else if (cpu_features->basic.kind == arch_kind_amd) { data = handle_amd (_SC_LEVEL1_DCACHE_SIZE); - core = handle_amd (_SC_LEVEL2_CACHE_SIZE); shared = handle_amd (_SC_LEVEL3_CACHE_SIZE); level1_icache_size = handle_amd (_SC_LEVEL1_ICACHE_SIZE); @@ -869,7 +868,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level1_dcache_size = data; level1_dcache_assoc = handle_amd (_SC_LEVEL1_DCACHE_ASSOC); level1_dcache_linesize = handle_amd (_SC_LEVEL1_DCACHE_LINESIZE); - level2_cache_size = core; + level2_cache_size = handle_amd (_SC_LEVEL2_CACHE_SIZE);; level2_cache_assoc = handle_amd (_SC_LEVEL2_CACHE_ASSOC); level2_cache_linesize = handle_amd (_SC_LEVEL2_CACHE_LINESIZE); level3_cache_size = shared; @@ -880,12 +879,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (shared <= 0) { /* No shared L3 cache. All we have is the L2 cache. */ - shared = core; + shared = level2_cache_size; } else if (cpu_features->basic.family < 0x17) { /* Account for exclusive L2 and L3 caches. */ - shared += core; + shared += level2_cache_size; } shared_per_thread = shared; @@ -987,6 +986,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (CPU_FEATURE_USABLE_P (cpu_features, FSRM)) rep_movsb_threshold = 2112; + /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of + cases slower than the vectorized path (and for some alignments, + it is really slow, check BZ #30994). */ + if (cpu_features->basic.kind == arch_kind_amd) + rep_movsb_threshold = non_temporal_threshold; + /* The default threshold to use Enhanced REP STOSB. */ unsigned long int rep_stosb_threshold = 2048; @@ -1028,16 +1033,9 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) SIZE_MAX); unsigned long int rep_movsb_stop_threshold; - /* ERMS feature is implemented from AMD Zen3 architecture and it is - performing poorly for data above L2 cache size. Henceforth, adding - an upper bound threshold parameter to limit the usage of Enhanced - REP MOVSB operations and setting its value to L2 cache size. */ - if (cpu_features->basic.kind == arch_kind_amd) - rep_movsb_stop_threshold = core; /* Setting the upper bound of ERMS to the computed value of - non-temporal threshold for architectures other than AMD. */ - else - rep_movsb_stop_threshold = non_temporal_threshold; + non-temporal threshold for all architectures. */ + rep_movsb_stop_threshold = non_temporal_threshold; cpu_features->data_cache_size = data; cpu_features->shared_cache_size = shared;