From patchwork Tue Oct 3 12:22:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 728728 Delivered-To: patch@linaro.org Received: by 2002:a5d:60c8:0:b0:31d:da82:a3b4 with SMTP id x8csp2112614wrt; Tue, 3 Oct 2023 05:23:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEt8KQSpJNxY/U6ISFJ7764cc6QxM7SKc8St7z8B6mVU7WjCXPyJPeFL96gvTLQkVjRlDuv X-Received: by 2002:a17:906:5190:b0:9b2:d554:da15 with SMTP id y16-20020a170906519000b009b2d554da15mr9719504ejk.38.1696335811130; Tue, 03 Oct 2023 05:23:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696335811; cv=none; d=google.com; s=arc-20160816; b=UuPIXH7azXjTQjunZDBHTvIi2jqXAyDqIolI7j/qHwzQkKj5nE8E+2PmxlS22FO8va yf7W4FdQUF71m+VI0kco37oBJ/7dYPRMYLNyFbA45KlxEL87rEwuHL+Q8NSNcQlDrumb E62K/GsFneHMD2w7PjaBLRlyHbPV6lCFF4GoIo18GCgeS0Ia0lR6QQOl4m+wJselZhQ+ mwtrM25DTBfpA6z9DGNO07sknSlPFZ+LFKlCiX/rsYk5l5W1ZYmmkwaCswQDYivFdpeY rgx0LhNxGH7UF/UHb/1O84jdQUnrbJXrEEa6CyHvd+ob3w628gOoSqLRIWhw9/O1DwRo ESag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature:dmarc-filter:delivered-to; bh=5YL53T0CWZlgfNRw37iJeGmfu3dUNrIcttKVQiTLn0A=; fh=ubzLPtaquqORyAJ/TX35zypB35/iXKzZJOEWlgP8mu4=; b=L8dp3PMRuiEoQYL47hkNiakVEO4KagI2JUNAcdj5xfOQJZ7oxUG6JIUlZww9xzaOSa 2C235iiIfqiCfF/As/TW92mhrRbh9AqK6x44MWdC+X0UzcI6FI31OaXUgqhSMKhqfBIE 3QGjpZZG+rzXYvqF5mKaY7WY8RzAOW4zro4jvLfMJ4Fn4lo678d4ftRu7+wM1zOQ+GXw /XSiuSOZM56dDkyJX9jywCTAkYaGnUncW4xZWLDOje8R/b8E2n5WhNcRyVy67Hbtulic +4egSXSYRHV4NMDl40hlmu76JiDMunyURhCtrlKfZnuQxc6FS6FVDcDPDmYyF9x/7ucO oLhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gR9u0SXK; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id u19-20020a170906951300b00993a7ae9f37si599490ejx.882.2023.10.03.05.23.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 05:23:31 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gR9u0SXK; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 15DBA3856241 for ; Tue, 3 Oct 2023 12:23:30 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by sourceware.org (Postfix) with ESMTPS id 7FDD63858436 for ; Tue, 3 Oct 2023 12:23:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7FDD63858436 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-pg1-x531.google.com with SMTP id 41be03b00d2f7-584bfb14c59so546209a12.0 for ; Tue, 03 Oct 2023 05:23:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1696335780; x=1696940580; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=5YL53T0CWZlgfNRw37iJeGmfu3dUNrIcttKVQiTLn0A=; b=gR9u0SXKHpkkCQImBVUkJsslyArWnwEnMLizX2MQmOuzpBKJXAKtU4Hrp7ygVHop7f keq/kjMSlx/z8fVCMe81bZS7HEd9sOiwsnXkUFzFebZaBcTDXGF/VUvaYO8MZ/YrbXVH FSYSv3xNYPA8mfk4cfWmkkOChsV9E/ucsWubAxt2/7jQMrNmGutIvhTKV8tXIbsnBqvD /nT7447jCzKFNrtKEPvkxfvy0Gk0Wm8/00jOa+EHUab3LxO7172nasQ4OTQx0fTVLOLZ dqKr5KiYeWpiczbwn+Gi6jn/L0cnejvmCw+SAPmHEFHscDHITV0zH6PfjacjHvvI3qbR 6BTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696335780; x=1696940580; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5YL53T0CWZlgfNRw37iJeGmfu3dUNrIcttKVQiTLn0A=; b=X/bNorUTnwg4AYW4IoHpvnkT1iAHxWv2Prv2sQQPktWtgYsPBZ5FhVyJ78ntZyfxBL SO9C9wwj6rw3LXlv+t41NIZYsAH5Q1fWNMCwaagG150LX6IXr1My24PWIjrCNPYEozpo V25YKZdcEo0AtxRrL5/cbrS1X/uT49YDVIDbqNT7QvdAb+uKHW1KNjp/ec27VZX2vO3V 3VKtZbWu975zH9MD4SKLrM1pcJ4GijF+P01pRVbiw8AfYKUNfMz7ZmaMUAqap0T4hwpf 3staPsiHm5JRvMjedHwO5iOO5PaAeIAvNCSC4zZ6zIP0aeXFwcZgO4z0j2GPAPjhm8YO dxKQ== X-Gm-Message-State: AOJu0Yw+lxPs75118PoLPkrXk/RErRNil+hjXSwDVyafsGJ7ct4UJNgP 6qfyeixC9LfYL+TjLg6QM4dUneoe6WlvQyvYXKU/XA== X-Received: by 2002:a05:6a20:1456:b0:134:a4e2:4ac8 with SMTP id a22-20020a056a20145600b00134a4e24ac8mr14231939pzi.39.1696335779886; Tue, 03 Oct 2023 05:22:59 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c1:feaf:31ef:b40c:b4e5:77c]) by smtp.gmail.com with ESMTPSA id b1-20020a170902d30100b001c5de2f1686sm1403881plc.99.2023.10.03.05.22.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 05:22:59 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Noah Goldstein , Paul Eggert , Florian Weimer Subject: [PATCH v8 2/7] stdlib: Optimization qsort{_r} swap implementation Date: Tue, 3 Oct 2023 09:22:46 -0300 Message-Id: <20231003122251.3325435-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231003122251.3325435-1-adhemerval.zanella@linaro.org> References: <20231003122251.3325435-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org The optimization takes in consideration both the most common elements are either 32 or 64 bit in size and inputs are aligned to the word boundary. This is similar to what msort does. For large buffer the swap operation uses memcpy/mempcpy with a small fixed size buffer (so compiler might inline the operations). Checked on x86_64-linux-gnu. Reviewed-by: Noah Goldstein --- stdlib/qsort.c | 95 ++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 77 insertions(+), 18 deletions(-) diff --git a/stdlib/qsort.c b/stdlib/qsort.c index 728a0ed370..072ccdfb95 100644 --- a/stdlib/qsort.c +++ b/stdlib/qsort.c @@ -21,22 +21,73 @@ #include #include +#include #include #include +#include -/* Byte-wise swap two items of size SIZE. */ -#define SWAP(a, b, size) \ - do \ - { \ - size_t __size = (size); \ - char *__a = (a), *__b = (b); \ - do \ - { \ - char __tmp = *__a; \ - *__a++ = *__b; \ - *__b++ = __tmp; \ - } while (--__size > 0); \ - } while (0) +/* Swap SIZE bytes between addresses A and B. These helpers are provided + along the generic one as an optimization. */ + +enum swap_type_t + { + SWAP_WORDS_64, + SWAP_WORDS_32, + SWAP_BYTES + }; + +/* If this function returns true, elements can be safely copied using word + loads and stores. Otherwise, it might not be safe. BASE (as an integer) + must be a multiple of the word alignment. SIZE must be a multiple of + WORDSIZE. Since WORDSIZE must be a multiple of the word alignment, and + WORDSIZE is a power of two on all supported platforms, this function for + speed merely checks that BASE and SIZE are both multiples of the word + size. */ +static inline bool +is_aligned (const void *base, size_t size, size_t wordsize) +{ + return (((uintptr_t) base | size) & (wordsize - 1)) == 0; +} + +static inline void +swap_words_64 (void * restrict a, void * restrict b, size_t n) +{ + typedef uint64_t __attribute__ ((__may_alias__)) u64_alias_t; + do + { + n -= 8; + u64_alias_t t = *(u64_alias_t *)(a + n); + *(u64_alias_t *)(a + n) = *(u64_alias_t *)(b + n); + *(u64_alias_t *)(b + n) = t; + } while (n); +} + +static inline void +swap_words_32 (void * restrict a, void * restrict b, size_t n) +{ + typedef uint32_t __attribute__ ((__may_alias__)) u32_alias_t; + do + { + n -= 4; + u32_alias_t t = *(u32_alias_t *)(a + n); + *(u32_alias_t *)(a + n) = *(u32_alias_t *)(b + n); + *(u32_alias_t *)(b + n) = t; + } while (n); +} + +/* Replace the indirect call with a serie of if statements. It should help + the branch predictor. */ +static void +do_swap (void * restrict a, void * restrict b, size_t size, + enum swap_type_t swap_type) +{ + if (swap_type == SWAP_WORDS_64) + swap_words_64 (a, b, size); + else if (swap_type == SWAP_WORDS_32) + swap_words_32 (a, b, size); + else + __memswap (a, b, size); +} /* Discontinue quicksort algorithm when partition gets below this size. This particular magic number was chosen to work best on a Sun 4/260. */ @@ -96,6 +147,14 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, /* Avoid lossage with unsigned arithmetic below. */ return; + enum swap_type_t swap_type; + if (is_aligned (pbase, size, 8)) + swap_type = SWAP_WORDS_64; + else if (is_aligned (pbase, size, 4)) + swap_type = SWAP_WORDS_32; + else + swap_type = SWAP_BYTES; + if (total_elems > MAX_THRESH) { char *lo = base_ptr; @@ -119,13 +178,13 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, char *mid = lo + size * ((hi - lo) / size >> 1); if ((*cmp) ((void *) mid, (void *) lo, arg) < 0) - SWAP (mid, lo, size); + do_swap (mid, lo, size, swap_type); if ((*cmp) ((void *) hi, (void *) mid, arg) < 0) - SWAP (mid, hi, size); + do_swap (mid, hi, size, swap_type); else goto jump_over; if ((*cmp) ((void *) mid, (void *) lo, arg) < 0) - SWAP (mid, lo, size); + do_swap (mid, lo, size, swap_type); jump_over:; left_ptr = lo + size; @@ -144,7 +203,7 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, if (left_ptr < right_ptr) { - SWAP (left_ptr, right_ptr, size); + do_swap (left_ptr, right_ptr, size, swap_type); if (mid == left_ptr) mid = right_ptr; else if (mid == right_ptr) @@ -216,7 +275,7 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, tmp_ptr = run_ptr; if (tmp_ptr != base_ptr) - SWAP (tmp_ptr, base_ptr, size); + do_swap (tmp_ptr, base_ptr, size, swap_type); /* Insertion sort, running from left-hand-side up to right-hand-side. */