From patchwork Fri Jan 6 20:20:58 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 90219 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp9607516qgi; Fri, 6 Jan 2017 12:21:25 -0800 (PST) X-Received: by 10.99.45.134 with SMTP id t128mr144547241pgt.86.1483734085366; Fri, 06 Jan 2017 12:21:25 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id c22si35651604pli.282.2017.01.06.12.21.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 06 Jan 2017 12:21:25 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-445600-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-445600-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-445600-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=u9g8di5xNcUgEBj5c GPpA6whhWKJquKJhyQ3c3ZAjvJDf1AbhJ5XGroYNoYRxp/wUVWnchhef2VzMk2Q0 qsEEB52uac529/V4JTREYdE7N+YiazSQeumJnTiMOe4toxGLpE3/bBovi+nqnbUB 5fhZ5rNlJ+om4PvJpMTZQVS7xg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=nmibddcy+45QxfN/usIxmoI ho8U=; b=HyhKkMPRUHpuHt3byRylbIZ2M5xBWJKrcf460hl6dYHzeDwHLyHqpay 9zcsMvxueFhBUi4fMRe248bfxzA1WVk49HF4KZawCJNHo7MOUrcmhYdNwlMh91FA z53KfdbE71Zi6SRpppnTnLyNkWKp+M3P8HV7tJ7lT+CWpFDoPq6A= Received: (qmail 2499 invoked by alias); 6 Jan 2017 20:21:12 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 2458 invoked by uid 89); 6 Jan 2017 20:21:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.1 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=__len X-Spam-User: qpsmtpd, 2 recipients X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 06 Jan 2017 20:21:01 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C538A7F7A6; Fri, 6 Jan 2017 20:21:00 +0000 (UTC) Received: from localhost (ovpn-116-26.ams2.redhat.com [10.36.116.26]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v06KKxJR011547; Fri, 6 Jan 2017 15:21:00 -0500 Date: Fri, 6 Jan 2017 20:20:58 +0000 From: Jonathan Wakely To: Aditya Kumar Cc: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org, hiraditya@msn.com Subject: Re: [PATCH] improve string find algorithm Message-ID: <20170106202058.GH2966@redhat.com> References: <1481132816-31162-1-git-send-email-aditya.k7@samsung.com> <20170106133502.GB2966@redhat.com> <016101d2682b$136dc890$3a4959b0$@samsung.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <016101d2682b$136dc890$3a4959b0$@samsung.com> X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.7.1 (2016-10-04) On 06/01/17 08:42 -0600, Aditya Kumar wrote: >Yes, we do. >Sorry for the mistake, it happened because I first wrote this for >libcxx (https://reviews.llvm.org/D27068) and while porting that line >got missed. > >Thanks, >-Aditya > > >diff --git a/libstdc++-v3/include/bits/basic_string.tcc >b/libstdc++-v3/include/bits/basic_string.tcc >index df1e8dd..7942ee6 100644 >--- a/libstdc++-v3/include/bits/basic_string.tcc >+++ b/libstdc++-v3/include/bits/basic_string.tcc >@@ -1194,14 +1194,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > if (__n == 0) > return __pos <= __size ? __pos : npos; > >- if (__n <= __size) >- { >- for (; __pos <= __size - __n; ++__pos) >- if (traits_type::eq(__data[__pos], __s[0]) >- && traits_type::compare(__data + __pos + 1, >- __s + 1, __n - 1) == 0) >- return __pos; >- } >+ if (__n > __size) >+ return npos; >+ >+ const _CharT __elem0 = __s[0]; >+ const _CharT* __first1 = __data; >+ const _CharT* __first2 = __s; >+ const _CharT* __last1 = __data + __size; >+ ptrdiff_t __len2 = __n - __pos; What's this variable for? >+ while (true) { >+ ptrdiff_t __len1 = __last1 - __first1; >+ if (__len1 < __len2) >+ return npos; >+ >+ // Find the first match. >+ __first1 = traits_type::find(__first1, __len1 - __len2 + 1, __elem0); >+ if (__first1 == 0) >+ return npos; >+ // Compare the full string when first match is found. >+ if (traits_type::compare(__first1, __first2, __len2) == 0) >+ return __first1 - __data; >+ >+ ++__first1; >+ } >+ > return npos; > } This is still wrong, consider std::string("abcd").find("ab", 2) This should return npos, but you return 0. The postcondition for the function is that the return value is not less than the pos argument. The attached patch should be a correct version of the improved algorithm. commit 2050cea863ebf5b958199478717ed717a3fb3abe Author: Jonathan Wakely Date: Fri Jan 6 19:50:02 2017 +0000 PR66414 optimize std::string::find 2017-01-06 Jonathan Wakely Aditya Kumar PR libstdc++/66414 * include/bits/basic_string.tcc (basic_string::find(const CharT*, size_type, size_type)): Optimize. diff --git a/libstdc++-v3/include/bits/basic_string.tcc b/libstdc++-v3/include/bits/basic_string.tcc index 8b2895b..41b7fa1 100644 --- a/libstdc++-v3/include/bits/basic_string.tcc +++ b/libstdc++-v3/include/bits/basic_string.tcc @@ -1190,18 +1190,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { __glibcxx_requires_string_len(__s, __n); const size_type __size = this->size(); - const _CharT* __data = _M_data(); if (__n == 0) return __pos <= __size ? __pos : npos; + if (__pos >= __size) + return npos; - if (__n <= __size) + const _CharT __elem0 = __s[0]; + const _CharT* const __data = data(); + const _CharT* __first = __data + __pos; + const _CharT* const __last = __data + __size; + size_type __len = __size - __pos; + + while (__len >= __n) { - for (; __pos <= __size - __n; ++__pos) - if (traits_type::eq(__data[__pos], __s[0]) - && traits_type::compare(__data + __pos + 1, - __s + 1, __n - 1) == 0) - return __pos; + // Find the first occurrence of __elem0: + __first = traits_type::find(__first, __len - __n + 1, __elem0); + if (!__first) + return npos; + // Compare the full strings from the first occurrence of __elem0. + // We already know that __first[0] == __s[0] but compare them again + // anyway because __s is probably aligned, which helps memcmp. + if (traits_type::compare(__first, __s, __n) == 0) + return __first - __data; + __len = __last - ++__first; } return npos; }