From patchwork Wed Feb 22 09:29:36 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 6875 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 7FF4A23EAE for ; Wed, 22 Feb 2012 09:25:11 +0000 (UTC) Received: from mail-iy0-f180.google.com (mail-iy0-f180.google.com [209.85.210.180]) by fiordland.canonical.com (Postfix) with ESMTP id 2E488A186C8 for ; Wed, 22 Feb 2012 09:25:11 +0000 (UTC) Received: by mail-iy0-f180.google.com with SMTP id z7so14128999iab.11 for ; Wed, 22 Feb 2012 01:25:10 -0800 (PST) Received: from mr.google.com ([10.50.34.202]) by 10.50.34.202 with SMTP id b10mr25075902igj.30.1329902710982 (num_hops = 1); Wed, 22 Feb 2012 01:25:10 -0800 (PST) Received: by 10.50.34.202 with SMTP id b10mr20219783igj.30.1329902710918; Wed, 22 Feb 2012 01:25:10 -0800 (PST) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.11.10 with SMTP id r10csp129384ibr; Wed, 22 Feb 2012 01:25:10 -0800 (PST) Received: by 10.42.142.7 with SMTP id q7mr25981647icu.9.1329902710457; Wed, 22 Feb 2012 01:25:10 -0800 (PST) Received: from song.cn.fujitsu.com ([222.73.24.84]) by mx.google.com with ESMTP id gf8si10085880igb.29.2012.02.22.01.25.09; Wed, 22 Feb 2012 01:25:10 -0800 (PST) Received-SPF: neutral (google.com: 222.73.24.84 is neither permitted nor denied by best guess record for domain of laijs@cn.fujitsu.com) client-ip=222.73.24.84; Authentication-Results: mx.google.com; spf=neutral (google.com: 222.73.24.84 is neither permitted nor denied by best guess record for domain of laijs@cn.fujitsu.com) smtp.mail=laijs@cn.fujitsu.com Received: from tang.cn.fujitsu.com (tang.cn.fujitsu.com [10.167.250.3]) by song.cn.fujitsu.com (Postfix) with ESMTP id 0946F170163; Wed, 22 Feb 2012 17:25:08 +0800 (CST) Received: from mailserver.fnst.cn.fujitsu.com (tang.cn.fujitsu.com [127.0.0.1]) by tang.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id q1M9P70c030316; Wed, 22 Feb 2012 17:25:07 +0800 Received: from lai.fc14.fnst ([10.167.225.146]) by mailserver.fnst.cn.fujitsu.com (Lotus Domino Release 8.5.1FP4) with ESMTP id 2012022217232551-733207 ; Wed, 22 Feb 2012 17:23:25 +0800 Message-ID: <4F44B580.6040003@cn.fujitsu.com> Date: Wed, 22 Feb 2012 17:29:36 +0800 From: Lai Jiangshan User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100921 Fedora/3.1.4-1.fc14 Thunderbird/3.1.4 MIME-Version: 1.0 To: paulmck@linux.vnet.ibm.com CC: linux-kernel@vger.kernel.org, mingo@elte.hu, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca, josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, Valdis.Kletnieks@vt.edu, dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com, fweisbec@gmail.com, patches@linaro.org Subject: [PATCH 3/3 RFC paul/rcu/srcu] srcu: flip only once for every grace period References: <20120213020951.GA12138@linux.vnet.ibm.com> <4F41F315.1040900@cn.fujitsu.com> <20120220174418.GI2470@linux.vnet.ibm.com> <4F42EF53.6060400@cn.fujitsu.com> <20120221015037.GE2384@linux.vnet.ibm.com> <4F435966.9020106@cn.fujitsu.com> <20120221172442.GG2375@linux.vnet.ibm.com> In-Reply-To: <20120221172442.GG2375@linux.vnet.ibm.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.1FP4|July 25, 2010) at 2012-02-22 17:23:25, Serialize by Router on mailserver/fnst(Release 8.5.1FP4|July 25, 2010) at 2012-02-22 17:23:26, Serialize complete at 2012-02-22 17:23:26 X-Gm-Message-State: ALoCoQmhBbg61bOty8Ig2jddBjtnyoWpI9KM12ULmScCD34Ho7f5FseEtyOsRTiAj4qN7n50DJn1 >From 4ddf62aaf2c4ebe6b9d4a1c596e8b43a678f1f0d Mon Sep 17 00:00:00 2001 From: Lai Jiangshan Date: Wed, 22 Feb 2012 14:12:02 +0800 Subject: [PATCH 3/3 RFC paul/rcu/srcu] srcu: flip only once for every grace period flip_idx_and_wait() is not changed, and is split as two functions and only a short comments is added for smp_mb() E. __synchronize_srcu() use a different algorithm for "leak" readers. detail is in the comments of the patch. Signed-off-by: Lai Jiangshan --- kernel/srcu.c | 105 ++++++++++++++++++++++++++++++++++---------------------- 1 files changed, 64 insertions(+), 41 deletions(-) diff --git a/kernel/srcu.c b/kernel/srcu.c index a51ac48..346f9d7 100644 --- a/kernel/srcu.c +++ b/kernel/srcu.c @@ -249,6 +249,37 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock); */ #define SYNCHRONIZE_SRCU_READER_DELAY 5 +static void wait_idx(struct srcu_struct *sp, int idx, bool expedited) +{ + int trycount = 0; + + /* + * SRCU read-side critical sections are normally short, so wait + * a small amount of time before possibly blocking. + */ + if (!srcu_readers_active_idx_check(sp, idx)) { + udelay(SYNCHRONIZE_SRCU_READER_DELAY); + while (!srcu_readers_active_idx_check(sp, idx)) { + if (expedited && ++ trycount < 10) + udelay(SYNCHRONIZE_SRCU_READER_DELAY); + else + schedule_timeout_interruptible(1); + } + } + + /* + * The following smp_mb() E pairs with srcu_read_unlock()'s + * smp_mb C to ensure that if srcu_readers_active_idx_check() + * sees srcu_read_unlock()'s counter decrement, then any + * of the current task's subsequent code will happen after + * that SRCU read-side critical section. + * + * It also ensures the order between the above waiting and + * the next flipping. + */ + smp_mb(); /* E */ +} + /* * Flip the readers' index by incrementing ->completed, then wait * until there are no more readers using the counters referenced by @@ -258,12 +289,12 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock); * Of course, it is possible that a reader might be delayed for the * full duration of flip_idx_and_wait() between fetching the * index and incrementing its counter. This possibility is handled - * by __synchronize_srcu() invoking flip_idx_and_wait() twice. + * by the next __synchronize_srcu() invoking wait_idx() for such readers + * before start a new grace perioad. */ static void flip_idx_and_wait(struct srcu_struct *sp, bool expedited) { int idx; - int trycount = 0; idx = sp->completed++ & 0x1; @@ -278,28 +309,7 @@ static void flip_idx_and_wait(struct srcu_struct *sp, bool expedited) */ smp_mb(); /* D */ - /* - * SRCU read-side critical sections are normally short, so wait - * a small amount of time before possibly blocking. - */ - if (!srcu_readers_active_idx_check(sp, idx)) { - udelay(SYNCHRONIZE_SRCU_READER_DELAY); - while (!srcu_readers_active_idx_check(sp, idx)) { - if (expedited && ++ trycount < 10) - udelay(SYNCHRONIZE_SRCU_READER_DELAY); - else - schedule_timeout_interruptible(1); - } - } - - /* - * The following smp_mb() E pairs with srcu_read_unlock()'s - * smp_mb C to ensure that if srcu_readers_active_idx_check() - * sees srcu_read_unlock()'s counter decrement, then any - * of the current task's subsequent code will happen after - * that SRCU read-side critical section. - */ - smp_mb(); /* E */ + wait_idx(sp, idx, expedited); } /* @@ -307,8 +317,6 @@ static void flip_idx_and_wait(struct srcu_struct *sp, bool expedited) */ static void __synchronize_srcu(struct srcu_struct *sp, bool expedited) { - int idx = 0; - rcu_lockdep_assert(!lock_is_held(&sp->dep_map) && !lock_is_held(&rcu_bh_lock_map) && !lock_is_held(&rcu_lock_map) && @@ -318,27 +326,42 @@ static void __synchronize_srcu(struct srcu_struct *sp, bool expedited) mutex_lock(&sp->mutex); /* - * If there were no helpers, then we need to do two flips of - * the index. The first flip is required if there are any - * outstanding SRCU readers even if there are no new readers - * running concurrently with the first counter flip. - * - * The second flip is required when a new reader picks up + * When in the previous grace perioad, if a reader picks up * the old value of the index, but does not increment its * counter until after its counters is summed/rechecked by - * srcu_readers_active_idx_check(). In this case, the current SRCU + * srcu_readers_active_idx_check(). In this case, the previous SRCU * grace period would be OK because the SRCU read-side critical - * section started after this SRCU grace period started, so the + * section started after the SRCU grace period started, so the * grace period is not required to wait for the reader. * - * However, the next SRCU grace period would be waiting for the - * other set of counters to go to zero, and therefore would not - * wait for the reader, which would be very bad. To avoid this - * bad scenario, we flip and wait twice, clearing out both sets - * of counters. + * However, such leftover readers affect this new SRCU grace period. + * So we have to wait for such readers. This wait_idx() should be + * considerred as the wait_idx() in the flip_idx_and_wait() of + * the previous grace perioad except that it is for leftover readers + * started before this synchronize_srcu(). So when it returns, + * there is no leftover readers that starts before this grace period. + * + * If there are some leftover readers that do not increment its + * counter until after its counters is summed/rechecked by + * srcu_readers_active_idx_check(), In this case, this SRCU + * grace period would be OK as above comments says. We defines + * such readers as leftover-leftover readers, we consider these + * readers fteched index of (sp->completed + 1), it means they + * are considerred as exactly the same as the readers after this + * grace period. + * + * wait_idx() is expected very fast, because leftover readers + * are unlikely produced. */ - for (; idx < 2; idx++) - flip_idx_and_wait(sp, expedited); + wait_idx(sp, (sp->completed - 1) & 0x1, expedited); + + /* + * Starts a new grace period, this flip is required if there are + * any outstanding SRCU readers even if there are no new readers + * running concurrently with the counter flip. + */ + flip_idx_and_wait(sp, expedited); + mutex_unlock(&sp->mutex); }