From patchwork Thu Jan 18 17:53:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 125000 Delivered-To: patch@linaro.org Received: by 10.46.64.27 with SMTP id n27csp228548lja; Thu, 18 Jan 2018 09:54:33 -0800 (PST) X-Google-Smtp-Source: ACJfBovIyk5HxeuC3LD5fTIsaHgUaCUcd9kQR9t2aAQwh6PYZy7X+0bjWWdLkabbssMDwx5GXguO X-Received: by 10.98.225.7 with SMTP id q7mr6939446pfh.22.1516298072909; Thu, 18 Jan 2018 09:54:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516298072; cv=none; d=google.com; s=arc-20160816; b=Glo+hPKxw0eS0C0c+Lmg+7vILAvLcx/F5Ri1VQFpqIiz9kDMXjSgWi1vX7/BmG4xeb NN019J0DZYpv9eCivg51SJVOLvMYrMrrDdGZrOmfZFIQo1A0D5XfTX1094eYKZWiygMa /kL+Q5bTeVykactUCfTf2YQAqOVews6yvU+S0BRNBp8vvbH9BrYeOBVEqVFia1eoo0K/ 1CV6X2JPPpklv7rlJdwrsvkM/NQQZ9alZIx4OIODnTOvGo5SzkCcPEV92IbZf2A9JKkV FnBnN/8xVhWxnYvrEsR4aM59eRyxXMC2Qm9B8L0bFQPiAeJAC9Q52rtf68w5Lh/7rtes ZG5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from:delivered-to :sender:list-help:list-post:list-archive:list-subscribe :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=R9hWCXWag6wiHTYJ1WPgdT1cVxJ7OO4Yn4vWhsaQhsg=; b=J/dZg876KWoFchcCxpNVbx/D8GKetcP2qhC1koO1Yw1Tckm2wUlbx6eOxRwfsf3b/p 5X58yWIjEIlf+sqrHwyyi13fO8SuOupzeRSETYwR7fGh94r6xt+u8YQ1644mK4fMu9YX jWskYV5STiiiX9vz/VDBfhGGfEEgJfk35H6AFiBHrRk+vvzOlyiDxeYBEtwUJzfskjlP vMlt/2AdEUnZtf0ik5OjCy2vyW1y4bQ2085Xn/n6MIgSaO6aJ8yEVUn8Nwo5vFo/RvSp 3Bs67Wa17WP0BL6XgpiruaVl+Jhda7H9elffJIg4Bq86e/k4YWeMRQD1bTsZN5kDW8fz 1edQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=VGgsy5i5; spf=pass (google.com: domain of libc-alpha-return-89327-patch=linaro.org@sourceware.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=libc-alpha-return-89327-patch=linaro.org@sourceware.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id z14si6456441pgo.298.2018.01.18.09.54.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Jan 2018 09:54:32 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-return-89327-patch=linaro.org@sourceware.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=VGgsy5i5; spf=pass (google.com: domain of libc-alpha-return-89327-patch=linaro.org@sourceware.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=libc-alpha-return-89327-patch=linaro.org@sourceware.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:in-reply-to :references; q=dns; s=default; b=D+Q2npHXmI60vGj4v8qM8e3noVvqjnA e46xOtrqOp0Xj8lNwW26WELDHW66Nzq6g5RnFTpVGIRp8a9ibotVImF9R2fw5KPZ qw8x4njkiJ6+FSKBH6DrgZDwhQpUkyXg8RGzuAw3cz+F3glsO42YJPamypf4uxsc zXfdYHSgXHXE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:in-reply-to :references; s=default; bh=BGJVDcxQ5ZKwiNAM7h7l+gwUO2I=; b=VGgsy 5i5v07oy8agFft3oSRlymO1bPfN3exhFOtPLhhgOqj1X8sF3EnyNkDHNfEh2a43u /zZ19f2IyxqJCL09W7qRXqIBnn5JOBz49kO2VUgsmRR1s5jJ9bwAjihjYT36dhtb oxQWzD2d4tpryCPcuBjlW9BqZ/preaWzsuaC1A= Received: (qmail 79878 invoked by alias); 18 Jan 2018 17:53:43 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 79784 invoked by uid 89); 18 Jan 2018 17:53:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-qt0-f194.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=R9hWCXWag6wiHTYJ1WPgdT1cVxJ7OO4Yn4vWhsaQhsg=; b=pZ47zC1rF0glkEruXZUg+2nPUe15Dcb0EqdRnSpAX5SOOd9kr39pesmMkvmafFu3Lc TPzf4l2qHibjiBE9Dd/xi5+E/e3G4YskOOVEwlis/ACwv51ax2mw2Fg2Owh2ZZevmc2v vy2WeeMvCUcTQUAuGFT6nX+074vEfaBnUmtKZsP48qVSld2kqfbkGaKFAh3bPm84VaJw O5og76hhPMjwX0KuPwOiA1A+QuKUgqF+JIeV8qynDNmT9NHZqBJ7r1ehLQMpQ93GWe64 VERKsch7oZhIEHm194+PNo+gatk+LSqyjuWg8Lz2n6WZCrcFrGjZJ0vtkduHY7DqGVwb 1QiQ== X-Gm-Message-State: AKwxytdhbrX6jZwbdQLgQfCSeNxF6SMVa8gtc+OzcpsADkUramd6EMB2 pBBoGAiwe66m8K8a6e5yuyUcEYMXpVg= X-Received: by 10.237.63.14 with SMTP id p14mr38271054qtf.225.1516298018289; Thu, 18 Jan 2018 09:53:38 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Subject: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation Date: Thu, 18 Jan 2018 15:53:21 -0200 Message-Id: <1516298002-4618-7-git-send-email-adhemerval.zanella@linaro.org> In-Reply-To: <1516298002-4618-1-git-send-email-adhemerval.zanella@linaro.org> References: <1516298002-4618-1-git-send-email-adhemerval.zanella@linaro.org> This patchs adds a optimized swap operation on qsort based in previous msort one. Instead of byte operation, three variants are provided: 1. Using uint32_t loads and stores. 2. Using uint64_t loads and stores. 3. Generic one with a temporary buffer and memcpy/mempcpy. The 1. and 2. option are selected only either if architecture defines _STRING_ARCH_unaligned or if base pointer is aligned to required type. This is due based on data for bench-qsort, usually programs calls qsort with array with multiple of machine word as element size. Benchmarking shows an increase performance: Results for member size 4 Sorted nmemb | base | patched | diff 32| 1401 | 1958 | 39.76 4096| 351333 | 368533 | 4.90 32768| 3369386 | 3131712 | -7.05 524288| 63192972 | 59807494 | -5.36 MostlySorted nmemb | base | patched | diff 32| 2391 | 2061 | -13.80 4096| 1124074 | 961816 | -14.43 32768| 11196607 | 9410438 | -15.95 524288| 215908169 | 185586732 | -14.04 Unsorted nmemb | base | patched | diff 32| 4993 | 2021 | -59.52 4096| 1113860 | 963126 | -13.53 32768| 11251293 | 9518795 | -15.40 524288| 217252237 | 185072278 | -14.81 Results for member size 8 Sorted nmemb | base | patched | diff 32| 1296 | 1267 | -2.24 4096| 359418 | 334852 | -6.83 32768| 3535229 | 3345157 | -5.38 524288| 69847251 | 67029358 | -4.03 MostlySorted nmemb | base | patched | diff 32| 2745 | 2340 | -14.75 4096| 1222082 | 1014314 | -17.00 32768| 12244800 | 9924706 | -18.95 524288| 241557971 | 196898760 | -18.49 Unsorted nmemb | base | patched | diff 32| 2972 | 2389 | -19.62 4096| 1314861 | 1024052 | -22.12 32768| 12397909 | 10120848 | -18.37 524288| 241789262 | 193414824 | -20.01 Results for member size 32 Sorted nmemb | base | patched | diff 32| 1305 | 1287 | -1.38 4096| 346332 | 347979 | 0.48 32768| 3458244 | 3408058 | -1.45 524288| 72793445 | 69973719 | -3.87 MostlySorted nmemb | base | patched | diff 32| 5435 | 4890 | -10.03 4096| 2032260 | 1688556 | -16.91 32768| 19909035 | 16419992 | -17.52 524288| 390339319 | 325921585 | -16.50 Unsorted nmemb | base | patched | diff 32| 5833 | 5351 | -8.26 4096| 2022531 | 1724961 | -14.71 32768| 19842888 | 16588545 | -16.40 524288| 388838382 | 324102703 | -16.65 Checked on x86_64-linux-gnu. [BZ #19305]. * stdlib/qsort.c (SWAP): Remove. (check_alignment, swap_u32, swap_u64, swap_generic, select_swap_func): New functions. (__qsort_r): --- stdlib/qsort.c | 77 ++++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 59 insertions(+), 18 deletions(-) -- 2.7.4 diff --git a/stdlib/qsort.c b/stdlib/qsort.c index b3a5102..2194003 100644 --- a/stdlib/qsort.c +++ b/stdlib/qsort.c @@ -23,20 +23,59 @@ #include #include #include +#include -/* Byte-wise swap two items of size SIZE. */ -#define SWAP(a, b, size) \ - do \ - { \ - size_t __size = (size); \ - char *__a = (a), *__b = (b); \ - do \ - { \ - char __tmp = *__a; \ - *__a++ = *__b; \ - *__b++ = __tmp; \ - } while (--__size > 0); \ - } while (0) +/* Swap SIZE bytes between addresses A and B. Helper to generic types + are provided as an optimization. */ + +typedef void (*swap_t)(void *, void *, size_t); + +static inline bool +check_alignment (const void *base, size_t align) +{ + return _STRING_ARCH_unaligned || ((uintptr_t)base % (align - 1)) == 0; +} + +static void +swap_u32 (void *a, void *b, size_t size) +{ + uint32_t tmp = *(uint32_t*) a; + *(uint32_t*) a = *(uint32_t*) b; + *(uint32_t*) b = tmp; +} + +static void +swap_u64 (void *a, void *b, size_t size) +{ + uint64_t tmp = *(uint64_t*) a; + *(uint64_t*) a = *(uint64_t*) b; + *(uint64_t*) b = tmp; +} + +static inline void +swap_generic (void *a, void *b, size_t size) +{ + unsigned char tmp[128]; + do + { + size_t s = size > sizeof (tmp) ? sizeof (tmp) : size; + memcpy (tmp, a, s); + a = __mempcpy (a, b, s); + b = __mempcpy (b, tmp, s); + size -= s; + } + while (size > 0); +} + +static inline swap_t +select_swap_func (const void *base, size_t size) +{ + if (size == 4 && check_alignment (base, 4)) + return swap_u32; + else if (size == 8 && check_alignment (base, 8)) + return swap_u64; + return swap_generic; +} /* Discontinue quicksort algorithm when partition gets below this size. This particular magic number was chosen to work best on a Sun 4/260. */ @@ -96,6 +135,8 @@ __qsort_r (void *const pbase, size_t total_elems, size_t size, /* Avoid lossage with unsigned arithmetic below. */ return; + swap_t swap = select_swap_func (pbase, size); + if (total_elems > MAX_THRESH) { char *lo = base_ptr; @@ -119,13 +160,13 @@ __qsort_r (void *const pbase, size_t total_elems, size_t size, char *mid = lo + size * ((hi - lo) / size >> 1); if ((*cmp) ((void *) mid, (void *) lo, arg) < 0) - SWAP (mid, lo, size); + swap (mid, lo, size); if ((*cmp) ((void *) hi, (void *) mid, arg) < 0) - SWAP (mid, hi, size); + swap (mid, hi, size); else goto jump_over; if ((*cmp) ((void *) mid, (void *) lo, arg) < 0) - SWAP (mid, lo, size); + swap (mid, lo, size); jump_over:; left_ptr = lo + size; @@ -144,7 +185,7 @@ __qsort_r (void *const pbase, size_t total_elems, size_t size, if (left_ptr < right_ptr) { - SWAP (left_ptr, right_ptr, size); + swap (left_ptr, right_ptr, size); if (mid == left_ptr) mid = right_ptr; else if (mid == right_ptr) @@ -216,7 +257,7 @@ __qsort_r (void *const pbase, size_t total_elems, size_t size, tmp_ptr = run_ptr; if (tmp_ptr != base_ptr) - SWAP (tmp_ptr, base_ptr, size); + swap (tmp_ptr, base_ptr, size); /* Insertion sort, running from left-hand-side up to right-hand-side. */