From patchwork Thu Feb 27 16:38:40 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charles Baylis X-Patchwork-Id: 25467 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ob0-f200.google.com (mail-ob0-f200.google.com [209.85.214.200]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id E139920447 for ; Thu, 27 Feb 2014 16:39:03 +0000 (UTC) Received: by mail-ob0-f200.google.com with SMTP id gq1sf9500623obb.3 for ; Thu, 27 Feb 2014 08:39:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:mime-version:in-reply-to:references:date:message-id :subject:from:to:x-original-sender:x-original-authentication-results :content-type; bh=Dq4MJHU3dwxeWTyB4rP/tckAXkkqXbe29hv1xSVIrr0=; b=UQ7vZXwE+630HiG+5CVyI41pVi/Gs+KW9IyXutYnjJ//ZDf3ZECA/7ScVLc106NxBF M8Xn5JI993kGvOVefXfa6HzhChH9YOgjnuXPTdmpiQO0SIRGXRrVM9wm4VqG592osJRj jFEsD31xJGvC6gx0B9gF/kw8wKz+W32Kt2TMrYmwBm1Y6P8ujZPsraEGME9V9+e3OWzR F6Akz/RKamJAYIBWk0WnnH96SlB5o4KcL/deiwwtEmBFYhXJ2uhS+P4Ux3IxRNAA9G2M 9OpjU3+035zcpt+bBJrDzBknpd8jB5mIA9MzuB/x42dKG/Rai6KwBps3ktArHPbqwZaG uxnQ== X-Gm-Message-State: ALoCoQm3j6is6aXQBrKEL5+p2Z7I0NuBNVXiRdMadzKkK3dKR4R2SOPHBSrclrnEoZdslX9mYiQU X-Received: by 10.182.52.136 with SMTP id t8mr1964401obo.41.1393519143336; Thu, 27 Feb 2014 08:39:03 -0800 (PST) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.84.239 with SMTP id l102ls765732qgd.5.gmail; Thu, 27 Feb 2014 08:39:03 -0800 (PST) X-Received: by 10.58.186.132 with SMTP id fk4mr11363719vec.9.1393519143183; Thu, 27 Feb 2014 08:39:03 -0800 (PST) Received: from mail-vc0-x232.google.com (mail-vc0-x232.google.com [2607:f8b0:400c:c03::232]) by mx.google.com with ESMTPS id wz4si4096vdc.150.2014.02.27.08.39.03 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 27 Feb 2014 08:39:03 -0800 (PST) Received-SPF: neutral (google.com: 2607:f8b0:400c:c03::232 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=2607:f8b0:400c:c03::232; Received: by mail-vc0-f178.google.com with SMTP id ik5so2734074vcb.37 for ; Thu, 27 Feb 2014 08:39:03 -0800 (PST) X-Received: by 10.58.94.195 with SMTP id de3mr1070074veb.39.1393519142920; Thu, 27 Feb 2014 08:39:02 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.220.174.196 with SMTP id u4csp29854vcz; Thu, 27 Feb 2014 08:39:02 -0800 (PST) X-Received: by 10.68.197.36 with SMTP id ir4mr14169891pbc.46.1393519141158; Thu, 27 Feb 2014 08:39:01 -0800 (PST) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id uc9si5193266pac.65.2014.02.27.08.39.00 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Feb 2014 08:39:01 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-362482-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 29320 invoked by alias); 27 Feb 2014 16:38:47 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 29309 invoked by uid 89); 27 Feb 2014 16:38:46 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-lb0-f174.google.com Received: from mail-lb0-f174.google.com (HELO mail-lb0-f174.google.com) (209.85.217.174) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 27 Feb 2014 16:38:44 +0000 Received: by mail-lb0-f174.google.com with SMTP id u14so1585182lbd.5 for ; Thu, 27 Feb 2014 08:38:40 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.112.136.227 with SMTP id qd3mr2070416lbb.55.1393519120650; Thu, 27 Feb 2014 08:38:40 -0800 (PST) Received: by 10.112.202.201 with HTTP; Thu, 27 Feb 2014 08:38:40 -0800 (PST) In-Reply-To: References: Date: Thu, 27 Feb 2014 16:38:40 +0000 Message-ID: Subject: Fwd: [PATCH, ARM] Improve 64 bit division performance From: Charles Baylis To: GCC Patches , Ramana Radhakrishnan , Richard Earnshaw X-IsSubscribed: yes X-Original-Sender: charles.baylis@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 2607:f8b0:400c:c03::232 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 [resending as text/plain] Hi These patches optimise 64 bit division by removing the use of the __gnu_[u]ldivmod_helper functions and hence avoiding the redundant calculation of the remainder in those functions. Bootstrapped, tested and checked for arm-unknown-linux-gnueabihf. Benchmarked on Chromebook and Raspberry Pi using attached divbench3.c. Loop1 varies the divisor and loop2 varies the dividend. Chromebook: before: loop1 unsigned: 3.474419 loop2 unsigned: 6.564871 loop1 signed: 4.127967 loop2 signed: 6.071490 after: loop1 unsigned: 2.781364 loop2 unsigned: 6.166478 loop1 signed: 2.800974 loop2 signed: 6.129588 Raspberry pi: before loop1 unsigned: 28.881753 loop2 unsigned: 19.876385 loop1 signed: 32.074941 loop2 signed: 20.594860 after: loop1 unsigned: 24.893846 loop2 unsigned: 19.537562 loop1 signed: 25.334509 loop2 signed: 19.615088 Any comments? OK for stage 1? Patch 1: 2014-02-27 Charles Baylis * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call to __udivmoddi4. Patch 2: 2014-02-27 Charles Baylis * config/arm/bpabi.S (__aeabi_ldivmod): Perform signed division via call to __udivmoddi4 and fixing up for negative operands. >From 975d9c624e77ee00476e6866250b0e2e31461fca Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 25 Feb 2014 16:27:59 +0000 Subject: [PATCH 2/2] Optimise __aeabi_ldivmod 2014-02-25 Charles Baylis * config/arm/bpabi.S (__aeabi_ldivmod): Perform signed division using unsigned division via call to __udivmoddi4 and additional logic. --- libgcc/config/arm/bpabi.S | 74 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 69 insertions(+), 5 deletions(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index e020af5..8b75a28 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -136,20 +136,84 @@ ARM_FUNC_START aeabi_ldivmod cfi_start __aeabi_ldivmod, LSYM(Lend_aeabi_ldivmod) test_div_by_zero signed - sub sp, sp, #8 -#if defined(__thumb2__) - mov ip, sp - push {ip, lr} +#if defined(__thumb2__) && CAN_USE_LDRD + sub ip, sp, #8 + strd ip,lr, [sp, #-16]! #else + sub sp, sp, #8 do_push {sp, lr} #endif + cmp xxh, #0 + blt 1f + cmp yyh, #0 + blt 2f + +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 + bl SYM(__udivmoddi4) __PLT__ + ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else + add sp, sp, #8 + do_pop {r2, r3} +#endif + RET +1: /* xxh:xxl is negative */ + rsbs xxl, xxl, #0 + sbc xxh, xxh, xxh, lsl #1 + cmp yyh, #0 + blt 3f +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 + bl SYM(__udivmoddi4) __PLT__ + ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else + add sp, sp, #8 + do_pop {r2, r3} +#endif + rsbs xxl, xxl, #0 + sbc xxh, xxh, xxh, lsl #1 + rsbs yyl, yyl, #0 + sbc yyh, yyh, yyh, lsl #1 + RET + +2: /* only yyh:yyl is negative */ + rsbs yyl, yyl, #0 + sbc yyh, yyh, yyh, lsl #1 98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 - bl SYM(__gnu_ldivmod_helper) __PLT__ + bl SYM(__udivmoddi4) __PLT__ ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else add sp, sp, #8 do_pop {r2, r3} +#endif + rsbs xxl, xxl, #0 + sbc xxh, xxh, xxh, lsl #1 RET + +3: /* both xxh:xxl and yyh:yyl are negative */ + rsbs yyl, yyl, #0 + sbc yyh, yyh, yyh, lsl #1 cfi_end LSYM(Lend_aeabi_ldivmod) +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 + bl SYM(__udivmoddi4) __PLT__ + ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else + add sp, sp, #8 + do_pop {r2, r3} +#endif + rsbs yyl, yyl, #0 + sbc yyh, yyh, yyh, lsl #1 + RET #endif /* L_aeabi_ldivmod */ -- 1.8.3.2