From patchwork Mon Apr 28 05:11:34 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: zhichang.yuan@linaro.org X-Patchwork-Id: 29169 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-pa0-f70.google.com (mail-pa0-f70.google.com [209.85.220.70]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 9149820534 for ; Mon, 28 Apr 2014 05:15:30 +0000 (UTC) Received: by mail-pa0-f70.google.com with SMTP id lj1sf20098128pab.1 for ; Sun, 27 Apr 2014 22:15:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:subject:date:message-id :in-reply-to:references:cc:precedence:list-id:list-unsubscribe :list-archive:list-post:list-help:list-subscribe:mime-version:sender :errors-to:x-original-sender:x-original-authentication-results :mailing-list:content-type:content-transfer-encoding; bh=uqBYY6Y4mmPdAB0ciXBduiT3SiQR5bO6puIbBhvMLTg=; b=Qgn2DrTktQ42SVLHX+zF/bKhqBCrwGdUg8Ds73dRsilCzLrRwreBr9wWGwKmiSRbh/ bjPP7O/OpWYeiRBFKNGdgf/X8ToAIVx7KtCxtzIrd0y7uBkNQJPXjc7kO9n9ZfyrOzdJ 0W28CfEUR1I3YLVXAUzNaFRlGrdCrgoxnK1ZbLlO7RRXXVvwf2OR4+YdlU9HycvU1q9V 16v9+0UVwQ+7vWtpaFV9EpQDdneoArq+m0LY13Vw16jj14lwRL/5ZiHv+Ak7ILp318+k E4XoaBr+Z0jnzcf12G00uu9keZiTA5CoFG4najYbrsxx124djlF9ktmttfkPNY4tLVPx 5hfQ== X-Gm-Message-State: ALoCoQkBns7MUF7njbKFWWj1h4GK7XAKT6sT9JzStIWNcGsPb0HOL0Gfb1uxj0NK0DyNp4Q5pEZ3 X-Received: by 10.66.156.66 with SMTP id wc2mr3551765pab.23.1398662129396; Sun, 27 Apr 2014 22:15:29 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.21.40 with SMTP id 37ls2315044qgk.32.gmail; Sun, 27 Apr 2014 22:15:29 -0700 (PDT) X-Received: by 10.52.95.135 with SMTP id dk7mr860422vdb.32.1398662129236; Sun, 27 Apr 2014 22:15:29 -0700 (PDT) Received: from mail-vc0-f173.google.com (mail-vc0-f173.google.com [209.85.220.173]) by mx.google.com with ESMTPS id z14si3384730vcf.208.2014.04.27.22.15.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 27 Apr 2014 22:15:29 -0700 (PDT) Received-SPF: none (google.com: patch+caf_=patchwork-forward=linaro.org@linaro.org does not designate permitted sender hosts) client-ip=209.85.220.173; Received: by mail-vc0-f173.google.com with SMTP id ik5so3121936vcb.18 for ; Sun, 27 Apr 2014 22:15:29 -0700 (PDT) X-Received: by 10.58.219.233 with SMTP id pr9mr22520903vec.10.1398662129153; Sun, 27 Apr 2014 22:15:29 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.220.221.72 with SMTP id ib8csp91049vcb; Sun, 27 Apr 2014 22:15:28 -0700 (PDT) X-Received: by 10.224.4.5 with SMTP id 5mr30005073qap.85.1398662128142; Sun, 27 Apr 2014 22:15:28 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id i1si7257595qab.44.2014.04.27.22.15.27 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 27 Apr 2014 22:15:28 -0700 (PDT) Received-SPF: none (google.com: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org does not designate permitted sender hosts) client-ip=2001:1868:205::9; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1Wedsm-0004dH-7h; Mon, 28 Apr 2014 05:13:36 +0000 Received: from mail-pa0-f49.google.com ([209.85.220.49]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1Wedsj-0004Sj-C7 for linux-arm-kernel@lists.infradead.org; Mon, 28 Apr 2014 05:13:34 +0000 Received: by mail-pa0-f49.google.com with SMTP id lj1so5272774pab.8 for ; Sun, 27 Apr 2014 22:13:13 -0700 (PDT) X-Received: by 10.66.253.170 with SMTP id ab10mr23292413pad.53.1398661992984; Sun, 27 Apr 2014 22:13:12 -0700 (PDT) Received: from localhost ([58.251.159.252]) by mx.google.com with ESMTPSA id vg1sm32034057pbc.44.2014.04.27.22.13.09 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sun, 27 Apr 2014 22:13:12 -0700 (PDT) From: zhichang.yuan@linaro.org To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, will.deacon@arm.com Subject: [PATCHv2 6/6] arm64: lib: Implement optimized string length routines Date: Mon, 28 Apr 2014 13:11:34 +0800 Message-Id: <1398661895-5559-7-git-send-email-zhichang.yuan@linaro.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1398661895-5559-1-git-send-email-zhichang.yuan@linaro.org> References: <1398661895-5559-1-git-send-email-zhichang.yuan@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20140427_221333_458392_D253D421 X-CRM114-Status: GOOD ( 19.07 ) X-Spam-Score: -0.7 (/) X-Spam-Report: SpamAssassin version 3.3.2 on bombadil.infradead.org summary: Content analysis details: (-0.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.220.49 listed in list.dnswl.org] Cc: dsaxena@linaro.org, liguozhu@huawei.com, "zhichang.yuan" X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: zhichang.yuan@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: patch+caf_=patchwork-forward=linaro.org@linaro.org does not designate permitted sender hosts) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 From: "zhichang.yuan" This patch, based on Linaro's Cortex Strings library, adds an assembly optimized strlen() and strnlen() functions. Signed-off-by: Zhichang Yuan Signed-off-by: Deepak Saxena --- arch/arm64/include/asm/string.h | 6 ++ arch/arm64/kernel/arm64ksyms.c | 2 + arch/arm64/lib/Makefile | 3 +- arch/arm64/lib/strlen.S | 126 +++++++++++++++++++++++++++++ arch/arm64/lib/strnlen.S | 171 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 307 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/lib/strlen.S create mode 100644 arch/arm64/lib/strnlen.S diff --git a/arch/arm64/include/asm/string.h b/arch/arm64/include/asm/string.h index 6133f49..64d2d48 100644 --- a/arch/arm64/include/asm/string.h +++ b/arch/arm64/include/asm/string.h @@ -28,6 +28,12 @@ extern int strcmp(const char *, const char *); #define __HAVE_ARCH_STRNCMP extern int strncmp(const char *, const char *, __kernel_size_t); +#define __HAVE_ARCH_STRLEN +extern __kernel_size_t strlen(const char *); + +#define __HAVE_ARCH_STRNLEN +extern __kernel_size_t strnlen(const char *, __kernel_size_t); + #define __HAVE_ARCH_MEMCPY extern void *memcpy(void *, const void *, __kernel_size_t); diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c index 2784a79..1edc792 100644 --- a/arch/arm64/kernel/arm64ksyms.c +++ b/arch/arm64/kernel/arm64ksyms.c @@ -46,6 +46,8 @@ EXPORT_SYMBOL(strchr); EXPORT_SYMBOL(strrchr); EXPORT_SYMBOL(strcmp); EXPORT_SYMBOL(strncmp); +EXPORT_SYMBOL(strlen); +EXPORT_SYMBOL(strnlen); EXPORT_SYMBOL(memset); EXPORT_SYMBOL(memcpy); EXPORT_SYMBOL(memmove); diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index aaaf618..d98d3e3 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -1,4 +1,5 @@ lib-y := bitops.o clear_user.o delay.o copy_from_user.o \ copy_to_user.o copy_in_user.o copy_page.o \ clear_page.o memchr.o memcpy.o memmove.o memset.o \ - memcmp.o strcmp.o strncmp.o strchr.o strrchr.o + memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \ + strchr.o strrchr.o diff --git a/arch/arm64/lib/strlen.S b/arch/arm64/lib/strlen.S new file mode 100644 index 0000000..987b68b --- /dev/null +++ b/arch/arm64/lib/strlen.S @@ -0,0 +1,126 @@ +/* + * Copyright (C) 2013 ARM Ltd. + * Copyright (C) 2013 Linaro. + * + * This code is based on glibc cortex strings work originally authored by Linaro + * and re-licensed under GPLv2 for the Linux kernel. The original code can + * be found @ + * + * http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/ + * files/head:/src/aarch64/ + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ + +#include +#include + +/* + * calculate the length of a string + * + * Parameters: + * x0 - const string pointer + * Returns: + * x0 - the return length of specific string + */ + +/* Arguments and results. */ +srcin .req x0 +len .req x0 + +/* Locals and temporaries. */ +src .req x1 +data1 .req x2 +data2 .req x3 +data2a .req x4 +has_nul1 .req x5 +has_nul2 .req x6 +tmp1 .req x7 +tmp2 .req x8 +tmp3 .req x9 +tmp4 .req x10 +zeroones .req x11 +pos .req x12 + +#define REP8_01 0x0101010101010101 +#define REP8_7f 0x7f7f7f7f7f7f7f7f +#define REP8_80 0x8080808080808080 + +ENTRY(strlen) + mov zeroones, #REP8_01 + bic src, srcin, #15 + ands tmp1, srcin, #15 + b.ne .Lmisaligned + /* + * NUL detection works on the principle that (X - 1) & (~X) & 0x80 + * (=> (X - 1) & ~(X | 0x7f)) is non-zero iff a byte is zero, and + * can be done in parallel across the entire word. + */ + /* + * The inner loop deals with two Dwords at a time. This has a + * slightly higher start-up cost, but we should win quite quickly, + * especially on cores with a high number of issue slots per + * cycle, as we get much better parallelism out of the operations. + */ +.Lloop: + ldp data1, data2, [src], #16 +.Lrealigned: + sub tmp1, data1, zeroones + orr tmp2, data1, #REP8_7f + sub tmp3, data2, zeroones + orr tmp4, data2, #REP8_7f + bic has_nul1, tmp1, tmp2 + bics has_nul2, tmp3, tmp4 + ccmp has_nul1, #0, #0, eq /* NZCV = 0000 */ + b.eq .Lloop + + sub len, src, srcin + cbz has_nul1, .Lnul_in_data2 +CPU_BE( mov data2, data1 ) /*prepare data to re-calculate the syndrome*/ + sub len, len, #8 + mov has_nul2, has_nul1 +.Lnul_in_data2: + /* + * For big-endian, carry propagation (if the final byte in the + * string is 0x01) means we cannot use has_nul directly. The + * easiest way to get the correct byte is to byte-swap the data + * and calculate the syndrome a second time. + */ +CPU_BE( rev data2, data2 ) +CPU_BE( sub tmp1, data2, zeroones ) +CPU_BE( orr tmp2, data2, #REP8_7f ) +CPU_BE( bic has_nul2, tmp1, tmp2 ) + + sub len, len, #8 + rev has_nul2, has_nul2 + clz pos, has_nul2 + add len, len, pos, lsr #3 /* Bits to bytes. */ + ret + +.Lmisaligned: + cmp tmp1, #8 + neg tmp1, tmp1 + ldp data1, data2, [src], #16 + lsl tmp1, tmp1, #3 /* Bytes beyond alignment -> bits. */ + mov tmp2, #~0 + /* Big-endian. Early bytes are at MSB. */ +CPU_BE( lsl tmp2, tmp2, tmp1 ) /* Shift (tmp1 & 63). */ + /* Little-endian. Early bytes are at LSB. */ +CPU_LE( lsr tmp2, tmp2, tmp1 ) /* Shift (tmp1 & 63). */ + + orr data1, data1, tmp2 + orr data2a, data2, tmp2 + csinv data1, data1, xzr, le + csel data2, data2, data2a, le + b .Lrealigned +ENDPROC(strlen) diff --git a/arch/arm64/lib/strnlen.S b/arch/arm64/lib/strnlen.S new file mode 100644 index 0000000..2ca6657 --- /dev/null +++ b/arch/arm64/lib/strnlen.S @@ -0,0 +1,171 @@ +/* + * Copyright (C) 2013 ARM Ltd. + * Copyright (C) 2013 Linaro. + * + * This code is based on glibc cortex strings work originally authored by Linaro + * and re-licensed under GPLv2 for the Linux kernel. The original code can + * be found @ + * + * http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/ + * files/head:/src/aarch64/ + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ + +#include +#include + +/* + * determine the length of a fixed-size string + * + * Parameters: + * x0 - const string pointer + * x1 - maximal string length + * Returns: + * x0 - the return length of specific string + */ + +/* Arguments and results. */ +srcin .req x0 +len .req x0 +limit .req x1 + +/* Locals and temporaries. */ +src .req x2 +data1 .req x3 +data2 .req x4 +data2a .req x5 +has_nul1 .req x6 +has_nul2 .req x7 +tmp1 .req x8 +tmp2 .req x9 +tmp3 .req x10 +tmp4 .req x11 +zeroones .req x12 +pos .req x13 +limit_wd .req x14 + +#define REP8_01 0x0101010101010101 +#define REP8_7f 0x7f7f7f7f7f7f7f7f +#define REP8_80 0x8080808080808080 + +ENTRY(strnlen) + cbz limit, .Lhit_limit + mov zeroones, #REP8_01 + bic src, srcin, #15 + ands tmp1, srcin, #15 + b.ne .Lmisaligned + /* Calculate the number of full and partial words -1. */ + sub limit_wd, limit, #1 /* Limit != 0, so no underflow. */ + lsr limit_wd, limit_wd, #4 /* Convert to Qwords. */ + + /* + * NUL detection works on the principle that (X - 1) & (~X) & 0x80 + * (=> (X - 1) & ~(X | 0x7f)) is non-zero iff a byte is zero, and + * can be done in parallel across the entire word. + */ + /* + * The inner loop deals with two Dwords at a time. This has a + * slightly higher start-up cost, but we should win quite quickly, + * especially on cores with a high number of issue slots per + * cycle, as we get much better parallelism out of the operations. + */ +.Lloop: + ldp data1, data2, [src], #16 +.Lrealigned: + sub tmp1, data1, zeroones + orr tmp2, data1, #REP8_7f + sub tmp3, data2, zeroones + orr tmp4, data2, #REP8_7f + bic has_nul1, tmp1, tmp2 + bic has_nul2, tmp3, tmp4 + subs limit_wd, limit_wd, #1 + orr tmp1, has_nul1, has_nul2 + ccmp tmp1, #0, #0, pl /* NZCV = 0000 */ + b.eq .Lloop + + cbz tmp1, .Lhit_limit /* No null in final Qword. */ + + /* + * We know there's a null in the final Qword. The easiest thing + * to do now is work out the length of the string and return + * MIN (len, limit). + */ + sub len, src, srcin + cbz has_nul1, .Lnul_in_data2 +CPU_BE( mov data2, data1 ) /*perpare data to re-calculate the syndrome*/ + + sub len, len, #8 + mov has_nul2, has_nul1 +.Lnul_in_data2: + /* + * For big-endian, carry propagation (if the final byte in the + * string is 0x01) means we cannot use has_nul directly. The + * easiest way to get the correct byte is to byte-swap the data + * and calculate the syndrome a second time. + */ +CPU_BE( rev data2, data2 ) +CPU_BE( sub tmp1, data2, zeroones ) +CPU_BE( orr tmp2, data2, #REP8_7f ) +CPU_BE( bic has_nul2, tmp1, tmp2 ) + + sub len, len, #8 + rev has_nul2, has_nul2 + clz pos, has_nul2 + add len, len, pos, lsr #3 /* Bits to bytes. */ + cmp len, limit + csel len, len, limit, ls /* Return the lower value. */ + ret + +.Lmisaligned: + /* + * Deal with a partial first word. + * We're doing two things in parallel here; + * 1) Calculate the number of words (but avoiding overflow if + * limit is near ULONG_MAX) - to do this we need to work out + * limit + tmp1 - 1 as a 65-bit value before shifting it; + * 2) Load and mask the initial data words - we force the bytes + * before the ones we are interested in to 0xff - this ensures + * early bytes will not hit any zero detection. + */ + ldp data1, data2, [src], #16 + + sub limit_wd, limit, #1 + and tmp3, limit_wd, #15 + lsr limit_wd, limit_wd, #4 + + add tmp3, tmp3, tmp1 + add limit_wd, limit_wd, tmp3, lsr #4 + + neg tmp4, tmp1 + lsl tmp4, tmp4, #3 /* Bytes beyond alignment -> bits. */ + + mov tmp2, #~0 + /* Big-endian. Early bytes are at MSB. */ +CPU_BE( lsl tmp2, tmp2, tmp4 ) /* Shift (tmp1 & 63). */ + /* Little-endian. Early bytes are at LSB. */ +CPU_LE( lsr tmp2, tmp2, tmp4 ) /* Shift (tmp1 & 63). */ + + cmp tmp1, #8 + + orr data1, data1, tmp2 + orr data2a, data2, tmp2 + + csinv data1, data1, xzr, le + csel data2, data2, data2a, le + b .Lrealigned + +.Lhit_limit: + mov len, limit + ret +ENDPROC(strnlen)