From patchwork Thu Sep 14 18:36:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 722583 Delivered-To: patch@linaro.org Received: by 2002:adf:f0d1:0:b0:31d:da82:a3b4 with SMTP id x17csp551687wro; Thu, 14 Sep 2023 11:37:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEhfLjZkZEBvAJ/rIsJ6DMz19vdMAKzyWeEUGdnDO0YeZeP/XXbfKunCsqhcYUw6q4XlMb1 X-Received: by 2002:a05:6512:368c:b0:4fe:21f2:a04a with SMTP id d12-20020a056512368c00b004fe21f2a04amr5102555lfs.8.1694716642892; Thu, 14 Sep 2023 11:37:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694716642; cv=none; d=google.com; s=arc-20160816; b=tUr0catOqCaDNG0SnLoA4N5rWtoUh/2IDyR0/bJmzcSxXQNHI1j9cy6N8vsM5c+7HL UolhQETdqodnoAHlqAuav7YCoa6BMEWtjp0ibm3Kw2d1X5+yAycqsVrQ6HHpOZgY6pBr AWH+zix0eJGMM26q0sHR9NYx29Wc6BzVxolEHR592Us6YmcTwRUb+kICq0TwP1DYW8wd Q9b2bTLfMHgknamVpyCc/kBIifkoPwaBT7xcxXNIald6wt0PmvlR1VLhXt1Evk7Q2aiz h+ZH0VRmT5G2B/rS6+SNC0P8q3d/aZQItBMhrbh9Dq/9k7E3r9MjF8hziq/biAHLL5ny HvSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature:dmarc-filter:delivered-to; bh=I/mdUPzISC+XmuJBFaDw8dL6gLvuN2O/2+RJn1xOxGA=; fh=ubzLPtaquqORyAJ/TX35zypB35/iXKzZJOEWlgP8mu4=; b=bLPuyMrOVpAyaSw2fee6nSTgHWy2ysOeLNpCdOeFNXjHabfkCWaDjIum4Va+m0elmW v61BRH+8/t28c9cnu/vVEzkfUoqS23xcsfUsMZE6heOaCgxcufijtNUZrkQepHN5gu03 UwpruadJzSe+zyDSOTEEtxQHLz8XH9Pgv2inAYGhYVfTf08MNNCSXkrruVmkjhGylllj Dj9/nWJmb9VHTseji9s2WelxObGcUNIE0NFjQUfiCvGg0EXkUkQxx00QySRH3Ml6BfEw ZJgwkDx6AFYEL1Sh7iLMTnfgslJcPfHRZ2yPJtQ/wSdgwdD+hRE9Et9B0wkZdJTGT8Ph swpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WLbuDEUw; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id f11-20020a056402068b00b005255337a1c8si1726898edy.625.2023.09.14.11.37.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 11:37:22 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WLbuDEUw; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E0253385CC97 for ; Thu, 14 Sep 2023 18:37:21 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x30.google.com (mail-oa1-x30.google.com [IPv6:2001:4860:4864:20::30]) by sourceware.org (Postfix) with ESMTPS id 4AD1C385771E for ; Thu, 14 Sep 2023 18:37:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4AD1C385771E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x30.google.com with SMTP id 586e51a60fabf-1d640e49e0aso358817fac.2 for ; Thu, 14 Sep 2023 11:37:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1694716634; x=1695321434; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=I/mdUPzISC+XmuJBFaDw8dL6gLvuN2O/2+RJn1xOxGA=; b=WLbuDEUwwsUgT03m8URvmSvVZJVmVc4nBcume2nCsXLU4GfCGyI8jzCSiScacHrRUH AKFY9RelqRL+C27wkuzwKUTXTODgQ4JAhspsKK177hsLbtXL+j3oh6swouA2zjJJNzGL iAFvxHXiXRYFxG104fR55TGXtilMMGgnm2xx5BSuf+C7TrhYiBczskq+49n3QZOAEa8G A1rknCvIZRxKfQ1okUe5s1sWfkpf2IxMHaq+qns7nCICOOizRVAefoP2wR3dE1tX5QvK o/mKuTaNbRs5Zn0oxTFtCE3THNnXbg+3lNxEqa4qularCZR92wZa/vG7PhaOgEydw1A4 rnYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694716634; x=1695321434; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=I/mdUPzISC+XmuJBFaDw8dL6gLvuN2O/2+RJn1xOxGA=; b=XekYqAETkeioXixBm49SHg9p/kDVY7PF1D8k8HGvlzAF8O1bKcTsOoh6THu2tAhqAD IMUi/f87o9Ebrqe+VW9NTmGw8rH2HRFkCegR5tZq9iWcUZvHy7WsWeo3Tnvs7wKuzuMp /zQ0DSy/eT9QZoCnKFTYif6w6+qK3aRxsvnOodbxW/LKiOUFHsUM5jD1eWW8mZpsuGgs tpfrAAMS+CCqVTSRO5v57yVI0n67dFQcgimabN2zZMPfDKqeqkAiC7irEL7tFkhyGXhx qflkOVR90NuVP85ndgJbHgIxEXm8CVahIQp6eXQrutZlDd790EijixzOwj56Z0jUwewm JRwA== X-Gm-Message-State: AOJu0YxW16V6ta/HIK7EwNASRMAdARZ21wc6s98pS4CEPT33bB4zgp7/ CtLAQcgkapyjOq3PAJo6mRwGeY8beuiI4CMH4FP3VA== X-Received: by 2002:a05:6871:797:b0:1d5:8f05:39da with SMTP id o23-20020a056871079700b001d58f0539damr6317636oap.25.1694716633988; Thu, 14 Sep 2023 11:37:13 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:91cb:5173:d70:d8d:4d2c]) by smtp.gmail.com with ESMTPSA id cc6-20020a056871e18600b001b3d67934e9sm1056930oac.26.2023.09.14.11.37.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 11:37:13 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Noah Goldstein , Paul Eggert , Florian Weimer Subject: [PATCH v7 2/7] stdlib: Optimization qsort{_r} swap implementation Date: Thu, 14 Sep 2023 15:36:59 -0300 Message-Id: <20230914183704.1386534-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230914183704.1386534-1-adhemerval.zanella@linaro.org> References: <20230914183704.1386534-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org The optimization takes in consideration both the most common elements are either 32 or 64 bit in size and inputs are aligned to the word boundary. This is similar to what msort does. For large buffer the swap operation uses memcpy/mempcpy with a small fixed size buffer (so compiler might inline the operations). Checked on x86_64-linux-gnu. --- stdlib/qsort.c | 94 ++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 76 insertions(+), 18 deletions(-) diff --git a/stdlib/qsort.c b/stdlib/qsort.c index 728a0ed370..bba9783191 100644 --- a/stdlib/qsort.c +++ b/stdlib/qsort.c @@ -23,20 +23,70 @@ #include #include #include +#include -/* Byte-wise swap two items of size SIZE. */ -#define SWAP(a, b, size) \ - do \ - { \ - size_t __size = (size); \ - char *__a = (a), *__b = (b); \ - do \ - { \ - char __tmp = *__a; \ - *__a++ = *__b; \ - *__b++ = __tmp; \ - } while (--__size > 0); \ - } while (0) +/* Swap SIZE bytes between addresses A and B. These helpers are provided + along the generic one as an optimization. */ + +enum swap_type_t + { + SWAP_WORDS_64, + SWAP_WORDS_32, + SWAP_BYTES + }; + +/* If this function returns true, elements can be safely copied using word + loads and stores. Otherwise, it might not be safe. BASE (as an integer) + must be a multiple of the word alignment. SIZE must be a multiple of + WORDSIZE. Since WORDSIZE must be a multiple of the word alignment, and + WORDSIZE is a power of two on all supported platforms, this function for + speed merely checks that BASE and SIZE are both multiples of the word + size. */ +static inline bool +is_aligned (const void *base, size_t size, size_t wordsize) +{ + return (((uintptr_t) base | size) & (wordsize - 1)) == 0; +} + +static inline void +swap_words_64 (void * restrict a, void * restrict b, size_t n) +{ + typedef uint64_t __attribute__ ((__may_alias__)) u64_alias_t; + do + { + n -= 8; + u64_alias_t t = *(u64_alias_t *)(a + n); + *(u64_alias_t *)(a + n) = *(u64_alias_t *)(b + n); + *(u64_alias_t *)(b + n) = t; + } while (n); +} + +static inline void +swap_words_32 (void * restrict a, void * restrict b, size_t n) +{ + typedef uint32_t __attribute__ ((__may_alias__)) u32_alias_t; + do + { + n -= 4; + u32_alias_t t = *(u32_alias_t *)(a + n); + *(u32_alias_t *)(a + n) = *(u32_alias_t *)(b + n); + *(u32_alias_t *)(b + n) = t; + } while (n); +} + +/* Replace the indirect call with a serie of if statements. It should help + the branch predictor. */ +static void +do_swap (void * restrict a, void * restrict b, size_t size, + enum swap_type_t swap_type) +{ + if (swap_type == SWAP_WORDS_64) + swap_words_64 (a, b, size); + else if (swap_type == SWAP_WORDS_32) + swap_words_32 (a, b, size); + else + __memswap (a, b, size); +} /* Discontinue quicksort algorithm when partition gets below this size. This particular magic number was chosen to work best on a Sun 4/260. */ @@ -96,6 +146,14 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, /* Avoid lossage with unsigned arithmetic below. */ return; + enum swap_type_t swap_type; + if (is_aligned (pbase, size, 8)) + swap_type = SWAP_WORDS_64; + else if (is_aligned (pbase, size, 4)) + swap_type = SWAP_WORDS_32; + else + swap_type = SWAP_BYTES; + if (total_elems > MAX_THRESH) { char *lo = base_ptr; @@ -119,13 +177,13 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, char *mid = lo + size * ((hi - lo) / size >> 1); if ((*cmp) ((void *) mid, (void *) lo, arg) < 0) - SWAP (mid, lo, size); + do_swap (mid, lo, size, swap_type); if ((*cmp) ((void *) hi, (void *) mid, arg) < 0) - SWAP (mid, hi, size); + do_swap (mid, hi, size, swap_type); else goto jump_over; if ((*cmp) ((void *) mid, (void *) lo, arg) < 0) - SWAP (mid, lo, size); + do_swap (mid, lo, size, swap_type); jump_over:; left_ptr = lo + size; @@ -144,7 +202,7 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, if (left_ptr < right_ptr) { - SWAP (left_ptr, right_ptr, size); + do_swap (left_ptr, right_ptr, size, swap_type); if (mid == left_ptr) mid = right_ptr; else if (mid == right_ptr) @@ -216,7 +274,7 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, tmp_ptr = run_ptr; if (tmp_ptr != base_ptr) - SWAP (tmp_ptr, base_ptr, size); + do_swap (tmp_ptr, base_ptr, size, swap_type); /* Insertion sort, running from left-hand-side up to right-hand-side. */