From patchwork Mon Nov 7 13:39:53 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Richard Earnshaw \(lists\)" X-Patchwork-Id: 81093 Delivered-To: patch@linaro.org Received: by 10.140.97.165 with SMTP id m34csp1017273qge; Mon, 7 Nov 2016 05:40:19 -0800 (PST) X-Received: by 10.99.168.78 with SMTP id i14mr10892082pgp.162.1478526018976; Mon, 07 Nov 2016 05:40:18 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id 188si7765613pfg.23.2016.11.07.05.40.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Nov 2016 05:40:18 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-440616-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-440616-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-440616-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=G20dYquqbZ9rDXCOghOps/KkvA0kQdfyCqIxxuxohFKeWgJ0BI gLxEVLgNeWIaVdQo27EJw+/frlrx7HQE79/XYbJDg7lCpQs7PW3WTaga2zvjdyEx YLofZDZSnhvoUFvrvsDimEpHzL8aH5Tj8WZ47cWpnOzvr9S7qTb3yIfvA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=T7JBu+LSAi3Uer7CEgOBqClWooc=; b=PFf8+2ZysCtkfyOIxZXy ykI6fidSiHsIUij3A/g6x2L6u9cvT2RCWCdDDfKMKBMmte7183gnXLY5LfDpuUzk 4jWuhnY+zcgR4VQaM3FoqvNCLjK3jvGLQu91yomiyaR+Icc41V9pjsnmW2N+Bg3p MbpNRUWvcUn1QtiIu0yXQdA= Received: (qmail 24897 invoked by alias); 7 Nov 2016 13:40:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 24880 invoked by uid 89); 7 Nov 2016 13:39:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.7 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=armv8, ARMv8, 4096, Hx-languages-length:3916 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 07 Nov 2016 13:39:57 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2476DAD7; Mon, 7 Nov 2016 05:39:55 -0800 (PST) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A83733F220; Mon, 7 Nov 2016 05:39:54 -0800 (PST) To: gcc-patches From: "Richard Earnshaw (lists)" Subject: [PATCH][AArch64] Optimized implementation of search_line_fast for the CPP lexer Message-ID: <7b1f910a-628f-089a-eed3-23476c1bda9e@arm.com> Date: Mon, 7 Nov 2016 13:39:53 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 This patch contains an implementation of search_line_fast for the CPP lexer. It's based in part on the AArch32 (ARM) code but incorporates new instructions available in AArch64 (reduction add operations) plus some tricks for reducing the realignment overheads. We assume a page size of 4k, but that's a safe assumption -- AArch64 systems can never have a smaller page size than that: on systems with larger pages we will go through the realignment code more often than strictly necessary, but it's still likely to be in the noise (less than 0.5% of the time). Bootstrapped on aarch64-none-linux-gnu. Although this is AArch64 specific and therefore I don't think it requires approval from anyone else, I'll wait 24 hours for comments. * lex.c (search_line_fast): New implementation for AArch64. R. diff --git a/libcpp/lex.c b/libcpp/lex.c index 6f65fa1..cea8848 100644 --- a/libcpp/lex.c +++ b/libcpp/lex.c @@ -752,6 +752,101 @@ search_line_fast (const uchar *s, const uchar *end ATTRIBUTE_UNUSED) } } +#elif defined (__ARM_NEON) && defined (__ARM_64BIT_STATE) +#include "arm_neon.h" + +/* This doesn't have to be the exact page size, but no system may use + a size smaller than this. ARMv8 requires a minimum page size of + 4k. The impact of being conservative here is a small number of + cases will take the slightly slower entry path into the main + loop. */ + +#define AARCH64_MIN_PAGE_SIZE 4096 + +static const uchar * +search_line_fast (const uchar *s, const uchar *end ATTRIBUTE_UNUSED) +{ + const uint8x16_t repl_nl = vdupq_n_u8 ('\n'); + const uint8x16_t repl_cr = vdupq_n_u8 ('\r'); + const uint8x16_t repl_bs = vdupq_n_u8 ('\\'); + const uint8x16_t repl_qm = vdupq_n_u8 ('?'); + const uint8x16_t xmask = (uint8x16_t) vdupq_n_u64 (0x8040201008040201ULL); + +#ifdef __AARCH64EB + const int16x8_t shift = {8, 8, 8, 8, 0, 0, 0, 0}; +#else + const int16x8_t shift = {0, 0, 0, 0, 8, 8, 8, 8}; +#endif + + unsigned int found; + const uint8_t *p; + uint8x16_t data; + uint8x16_t t; + uint16x8_t m; + uint8x16_t u, v, w; + + /* Align the source pointer. */ + p = (const uint8_t *)((uintptr_t)s & -16); + + /* Assuming random string start positions, with a 4k page size we'll take + the slow path about 0.37% of the time. */ + if (__builtin_expect ((AARCH64_MIN_PAGE_SIZE + - (((uintptr_t) s) & (AARCH64_MIN_PAGE_SIZE - 1))) + < 16, 0)) + { + /* Slow path: the string starts near a possible page boundary. */ + uint32_t misalign, mask; + + misalign = (uintptr_t)s & 15; + mask = (-1u << misalign) & 0xffff; + data = vld1q_u8 (p); + t = vceqq_u8 (data, repl_nl); + u = vceqq_u8 (data, repl_cr); + v = vorrq_u8 (t, vceqq_u8 (data, repl_bs)); + w = vorrq_u8 (u, vceqq_u8 (data, repl_qm)); + t = vorrq_u8 (v, w); + t = vandq_u8 (t, xmask); + m = vpaddlq_u8 (t); + m = vshlq_u16 (m, shift); + found = vaddvq_u16 (m); + found &= mask; + if (found) + return (const uchar*)p + __builtin_ctz (found); + } + else + { + data = vld1q_u8 ((const uint8_t *) s); + t = vceqq_u8 (data, repl_nl); + u = vceqq_u8 (data, repl_cr); + v = vorrq_u8 (t, vceqq_u8 (data, repl_bs)); + w = vorrq_u8 (u, vceqq_u8 (data, repl_qm)); + t = vorrq_u8 (v, w); + if (__builtin_expect (vpaddd_u64 ((uint64x2_t)t), 0)) + goto done; + } + + do + { + p += 16; + data = vld1q_u8 (p); + t = vceqq_u8 (data, repl_nl); + u = vceqq_u8 (data, repl_cr); + v = vorrq_u8 (t, vceqq_u8 (data, repl_bs)); + w = vorrq_u8 (u, vceqq_u8 (data, repl_qm)); + t = vorrq_u8 (v, w); + } while (!vpaddd_u64 ((uint64x2_t)t)); + +done: + /* Now that we've found the terminating substring, work out precisely where + we need to stop. */ + t = vandq_u8 (t, xmask); + m = vpaddlq_u8 (t); + m = vshlq_u16 (m, shift); + found = vaddvq_u16 (m); + return (((((uintptr_t) p) < (uintptr_t) s) ? s : (const uchar *)p) + + __builtin_ctz (found)); +} + #elif defined (__ARM_NEON) #include "arm_neon.h"