From patchwork Thu Feb 8 13:08:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 770848 Delivered-To: patch@linaro.org Received: by 2002:adf:9dca:0:b0:33b:4db1:f5b3 with SMTP id q10csp310823wre; Thu, 8 Feb 2024 05:08:59 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCW0lw+SWgpSyurqe8C0niA8gtm5KGXmuG62lxvaC2ZHwWtvIczI5us2jKAXe8p5yG6BYnB8z3BfYKvuSJCgrTWg X-Google-Smtp-Source: AGHT+IFtGC6Cqs+k2HdEMNpTp8PLtNhvqHRcjYrDTOJUb8E8bt92RgJqp6oB+0mPr0j2jQrpBdHs X-Received: by 2002:ac8:7f0f:0:b0:42c:111a:ac7c with SMTP id f15-20020ac87f0f000000b0042c111aac7cmr3264632qtk.34.1707397739238; Thu, 08 Feb 2024 05:08:59 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707397739; cv=pass; d=google.com; s=arc-20160816; b=rk5jO3rp3raD25b1a78Wq4JJf3w5gWGMrQNvrEQxeOKl8c88f0ofWKeaikejVslshw HODDBqPtMZR3/lVkWqYTDD7gfkgoJKnaYSPV0lvXhQ7qIPSm3WggrxyD+v3g2Lv+Ndxr WEj3K705tsP5urQj3RMifKZYAUIozrmcYmOlx/3BFMCeCnP8mHIXHyOO4MdsDC+79aVO WrgEn2upJHz5nrVj2yBI4hdRj7HI7NrJOAQLaHh3xykdCPvUpRoaP+KLue4wTRCxs15V Q0wGpXSZKY5nnNKbLmYXwrOxUG2f+lyBV8G+10Udb62qAt7hAjRrYqqUwu5jVei6MW2m /XvA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=TWPdJxoX3RLoMw+2A+cZ4dPHRNyJJQ10LGZvK0uD61Q=; fh=neaPuxE3IWgq/jC3zvQ80wQuEsMmw0DuIc8TimQt7jE=; b=tw8Et5i3lSPkUXjrWAs8wwe28/oJ+/4x7hdPD1U4KR+TT3f/xw3Jk44M2cAlDZ2PF4 Gs1kW9FCS0v/DDMzNlNY2M7iqOGKd/qsjH5qNePy49pJyWlzOT8DS6vgEzgubflGnH+q fxrhIvIrhoTDnCyr26wzbQ+BfXIlta3Fy7EmSDs5Nrizs/tTGebuiigVHDRMXKVUf5VX +1eTEqXwWSCq2ZmIoz6TuGLxX/YrJPwxlWeBxnaG/c2DLPfYJEB1GwUDdxSfQKqMaeBl Xun4qMw3be4lv0xCeJaPOVAwjxHOZwaEXQ4DBKAO9tFIINsAsycpvXNOGczNR7sKBqmw 15/w==; darn=linaro.org ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Yy5ZoUss; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org X-Forwarded-Encrypted: i=2; AJvYcCX3XPZeikhEy9V6vj7ZzXVlcDyIev+qWbqbkJ+yUWYWbo5jPr7uF9nxUIkpl7q24DMDpQXQQhTNTzj+YYIusI/9 Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s17-20020a05622a179100b0042c3613eee1si3141009qtk.41.2024.02.08.05.08.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:08:59 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Yy5ZoUss; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B40813858291 for ; Thu, 8 Feb 2024 13:08:58 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id 636603858C53 for ; Thu, 8 Feb 2024 13:08:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 636603858C53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 636603858C53 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::629 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397729; cv=none; b=M9Fet4UiNSWaN9CcDHE2jx57L93/DQ9QjvpGPuDDLzoppqS2nvrMAAA2X6ysCdd5LbmtE3nv2KWyXaMB+p3zvfF0Ham0HOThbBAFY5KHjqJpaP5/v04dcCN7iBY9d8nVff8dvfNtXlzQiWqeYgwhsEMxwySA+9gFbRDCSr/5wH4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397729; c=relaxed/simple; bh=59RU/A4UMuUmeoF75pajvFBmVgDt+EuRtdZ7N0mqeVA=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=w3enGBSvjcQiafCWtCcCNTbF4d3JS5fJiV0WD1xvzbAorjXxyGgrmYeUohWClNguAQp46ZTbkZBo7DSRwjThQap42NFt2GflQPpOSqiVSL8g6rZXTnEuMERq6077gIwjt3HY8EVFnqXzPK5b83ICPs3fDouFEi8P4zVBa5ayZsc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1d958e0d73dso7222395ad.1 for ; Thu, 08 Feb 2024 05:08:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707397726; x=1708002526; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=TWPdJxoX3RLoMw+2A+cZ4dPHRNyJJQ10LGZvK0uD61Q=; b=Yy5ZoUss74tSgfs9lhjXHG/PZjsrDw8m+nQYNEMdFhKYz0Sc5zGuAo9on/SIkMi5Ix dB13XhlEOi7IrtC+OjyU7byr9znjGhnFqVDtSWlJF5Ea/bruaRZfgLfIZ5NcJgSqX32D fwEYS2wyByB8qYt/Picp1WXK+amQ2RvaG0BCRPIPXz+7OyOzGqSmjajaXIOukJ76SVUq dMs880Q6m+grv3wcIWsvOh4xC7DrlpWaCgcmayvskfiLk+Xk/lkh59ULHI8i1qkZjElI 6K0+BhmpXGjQWyJxraZXQuFuEJJkYd3wqxIi7AP6fBEw/uSayUrGyKKnIIV0P7RX+ohP 6Z5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707397726; x=1708002526; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TWPdJxoX3RLoMw+2A+cZ4dPHRNyJJQ10LGZvK0uD61Q=; b=pXz5JW+RbPBxaOMGrJSwpByg/LQFu2i784WccH/pe32ma3QR1ghl6oQXVU9GZbqCR5 0zQoW5xo+LYbRkWoLuYHYFF+kCIg6qyJ17AZy/HHMoLkkuMFAUHRROmzSnCDCdIwFYmu 3goIvbp/2mc+knx+lRHzjk3EgcdbCRTaXa7zQzYc7MPpV3ktUUlzlIuBSWAN1wIuv3Wl bnveaPUyIaExARtDDS6pkvRnGweXjh4e6V4cmtmNU/pJtROyJiSNBvAjFnrHp69XY9Li gJ6jd5ic3pYr3Fv8/jVyefEqQfrmVN4wG7b0Wmy0WGk6Ko1eNdPThcei+zDlQZofEyE5 twBA== X-Gm-Message-State: AOJu0YxSJ9P6NX7Dc3ffIgj0US2VCqzFGVlgVOfRJDBdX4XNiyC6gxUf K2yLcn3lRjXteKp9A00H/OLl4JmYSrtvZxv2UeOI6IrRtwy9DDvIzNRMqU6nafO+y8e7Bw0EhvM E X-Received: by 2002:a17:902:dacf:b0:1d9:4106:b8b5 with SMTP id q15-20020a170902dacf00b001d94106b8b5mr4708452plx.11.1707397725795; Thu, 08 Feb 2024 05:08:45 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCX9cmViuqb6G6kdf+uJfySSbYXu1xqaLI29dMzSpW0uqCRo4ejjsLrI1yA/mWrNJPH85yQ31KPZPKgwY+Y95DqJY3NJMgC4DfumVExJK28ln8g7XKhVlrN2EPnEs42dqwVplRr6Z7EMagBK9Q94gzVdFy+pAoG8wE8agmcx2cMfiVMqsH+ysh88Jg== Received: from mandiga.. ([2804:1b3:a7c0:378:6793:1dc3:1346:d6d6]) by smtp.gmail.com with ESMTPSA id 4-20020a170902c14400b001d9fc535378sm1844083plj.236.2024.02.08.05.08.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:08:45 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v3 0/3] x86: Improve ERMS usage on Zen3+ Date: Thu, 8 Feb 2024 10:08:37 -0300 Message-Id: <20240208130840.533348-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org For the sizes where REP MOVSB and REP STOSB are used on Zen3+ cores, the result performance is lower than vectorized instructions (with some input alignment showing a very large performance gap as indicated by BZ#30995). The glibc enables ERMS on AMD code for sizes between 2113 (rep_movsb_threshold) and L2 cache size rep_movsb_stop_threshold or 524288 on a Zen3 core). Using the provided benchmarks from BZ#30995, the memcpy on Ryzen 9 5900X shows: Size (bytes) Destination Alignment Throughput (GB/s) 2113 0 84.2448 2113 15 4.4310 524287 0 57.1122 524287 15 4.34671 While by using vectorized instructions with the tunable GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 it shows: Size (bytes) Destination Alignment Throughput (GB/s) 2113 0 124.1830 2113 15 121.8720 524287 0 58.3212 524287 15 58.5352 Increasing the number of concurrent jobs does show improvements in ERMS over vectorized instructions as well. The performance difference with ERMS improves if input alignments are equal, although it does not reach parity with the vectorized path. The memset also shows similar performance improvement with vectorized instructions instead of REP STOSB. On the same machine, the default strategy shows: Size (bytes) Destination Alignment Throughput (GB/s) 2113 0 68.0113 2113 15 56.1880 524287 0 119.3670 524287 15 116.2590 While with GLIBC_TUNABLES=glibc.cpu.x86_rep_stosb_threshold=1000000: Size (bytes) Destination Alignment Throughput (GB/s) 2113 0 133.2310 2113 15 132.5800 524287 0 112.0650 524287 15 118.0960 I also saw a slight performance increase on 502.gcc_r (1 copy), where where result went from 9.82 to 9.85. The benchmarks hit hard both memcpy and memset. Changes from v2: - Removed rep_movsb_stop_threshold tunable. - Simplify the memset change. Changes from v1: - Reword comment and commit message. Adhemerval Zanella (3): x86: Fix Zen3/Zen4 ERMS selection (BZ 30994) x86: Do not prefer ERMS for memset on Zen3+ x86: Expand the comment on when REP STOSB is used on memset sysdeps/x86/dl-cacheinfo.h | 43 ++++++++++--------- .../multiarch/memset-vec-unaligned-erms.S | 4 +- 2 files changed, 26 insertions(+), 21 deletions(-)