From patchwork Wed Feb 22 09:29:36 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Lai Jiangshan <laijs@cn.fujitsu.com>
X-Patchwork-Id: 6875
Return-Path: <patch+caf_=linaro-patchwork=canonical.com@linaro.org>
X-Original-To: patchwork@peony.canonical.com
Delivered-To: patchwork@peony.canonical.com
Received: from fiordland.canonical.com (fiordland.canonical.com
 [91.189.94.145])
 by peony.canonical.com (Postfix) with ESMTP id 7FF4A23EAE
 for <patchwork@peony.canonical.com>;
 Wed, 22 Feb 2012 09:25:11 +0000 (UTC)
Received: from mail-iy0-f180.google.com (mail-iy0-f180.google.com
 [209.85.210.180])
 by fiordland.canonical.com (Postfix) with ESMTP id 2E488A186C8
 for <linaro-patchwork@canonical.com>;
 Wed, 22 Feb 2012 09:25:11 +0000 (UTC)
Received: by mail-iy0-f180.google.com with SMTP id z7so14128999iab.11
 for <linaro-patchwork@canonical.com>;
 Wed, 22 Feb 2012 01:25:10 -0800 (PST)
Received: from mr.google.com ([10.50.34.202])
 by 10.50.34.202 with SMTP id b10mr25075902igj.30.1329902710982
 (num_hops = 1); Wed, 22 Feb 2012 01:25:10 -0800 (PST)
Received: by 10.50.34.202 with SMTP id b10mr20219783igj.30.1329902710918;
 Wed, 22 Feb 2012 01:25:10 -0800 (PST)
X-Forwarded-To: linaro-patchwork@canonical.com
X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com
Delivered-To: patches@linaro.org
Received: by 10.231.11.10 with SMTP id r10csp129384ibr;
 Wed, 22 Feb 2012 01:25:10 -0800 (PST)
Received: by 10.42.142.7 with SMTP id q7mr25981647icu.9.1329902710457;
 Wed, 22 Feb 2012 01:25:10 -0800 (PST)
Received: from song.cn.fujitsu.com ([222.73.24.84])
 by mx.google.com with ESMTP id
 gf8si10085880igb.29.2012.02.22.01.25.09; 
 Wed, 22 Feb 2012 01:25:10 -0800 (PST)
Received-SPF: neutral (google.com: 222.73.24.84 is neither permitted nor
 denied by best guess record for domain of
 laijs@cn.fujitsu.com) client-ip=222.73.24.84; 
Authentication-Results: mx.google.com;
 spf=neutral (google.com: 222.73.24.84 is neither
 permitted nor denied by best guess record for domain of
 laijs@cn.fujitsu.com) smtp.mail=laijs@cn.fujitsu.com
Received: from tang.cn.fujitsu.com (tang.cn.fujitsu.com [10.167.250.3])
 by song.cn.fujitsu.com (Postfix) with ESMTP id 0946F170163;
 Wed, 22 Feb 2012 17:25:08 +0800 (CST)
Received: from mailserver.fnst.cn.fujitsu.com (tang.cn.fujitsu.com [127.0.0.1])
 by tang.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id q1M9P70c030316; 
 Wed, 22 Feb 2012 17:25:07 +0800
Received: from lai.fc14.fnst ([10.167.225.146])
 by mailserver.fnst.cn.fujitsu.com (Lotus Domino Release 8.5.1FP4)
 with ESMTP id 2012022217232551-733207 ;
 Wed, 22 Feb 2012 17:23:25 +0800
Message-ID: <4F44B580.6040003@cn.fujitsu.com>
Date: Wed, 22 Feb 2012 17:29:36 +0800
From: Lai Jiangshan <laijs@cn.fujitsu.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
 rv:1.9.2.9) Gecko/20100921 Fedora/3.1.4-1.fc14 Thunderbird/3.1.4
MIME-Version: 1.0
To: paulmck@linux.vnet.ibm.com
CC: linux-kernel@vger.kernel.org, mingo@elte.hu, dipankar@in.ibm.com,
 akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
 josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
 peterz@infradead.org, rostedt@goodmis.org, Valdis.Kletnieks@vt.edu,
 dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com,
 fweisbec@gmail.com, patches@linaro.org
Subject: [PATCH 3/3 RFC paul/rcu/srcu] srcu: flip only once for every grace
 period
References: <20120213020951.GA12138@linux.vnet.ibm.com>
 <4F41F315.1040900@cn.fujitsu.com>
 <20120220174418.GI2470@linux.vnet.ibm.com>
 <4F42EF53.6060400@cn.fujitsu.com>
 <20120221015037.GE2384@linux.vnet.ibm.com>
 <4F435966.9020106@cn.fujitsu.com>
 <20120221172442.GG2375@linux.vnet.ibm.com>
In-Reply-To: <20120221172442.GG2375@linux.vnet.ibm.com>
X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.1FP4|July
 25, 2010) at 2012-02-22 17:23:25,
 Serialize by Router on mailserver/fnst(Release 8.5.1FP4|July 25,
 2010) at 2012-02-22 17:23:26,
 Serialize complete at 2012-02-22 17:23:26
X-Gm-Message-State: ALoCoQmhBbg61bOty8Ig2jddBjtnyoWpI9KM12ULmScCD34Ho7f5FseEtyOsRTiAj4qN7n50DJn1

>From 4ddf62aaf2c4ebe6b9d4a1c596e8b43a678f1f0d Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Wed, 22 Feb 2012 14:12:02 +0800
Subject: [PATCH 3/3 RFC paul/rcu/srcu] srcu: flip only once for every grace period

flip_idx_and_wait() is not changed, and is split as two functions
and only a short comments is added for smp_mb() E.

__synchronize_srcu() use a different algorithm for "leak" readers.
detail is in the comments of the patch.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/srcu.c |  105 ++++++++++++++++++++++++++++++++++----------------------
 1 files changed, 64 insertions(+), 41 deletions(-)

diff --git a/kernel/srcu.c b/kernel/srcu.c
index a51ac48..346f9d7 100644
--- a/kernel/srcu.c
+++ b/kernel/srcu.c
@@ -249,6 +249,37 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
  */
 #define SYNCHRONIZE_SRCU_READER_DELAY 5
 
+static void wait_idx(struct srcu_struct *sp, int idx, bool expedited)
+{
+	int trycount = 0;
+
+	/*
+	 * SRCU read-side critical sections are normally short, so wait
+	 * a small amount of time before possibly blocking.
+	 */
+	if (!srcu_readers_active_idx_check(sp, idx)) {
+		udelay(SYNCHRONIZE_SRCU_READER_DELAY);
+		while (!srcu_readers_active_idx_check(sp, idx)) {
+			if (expedited && ++ trycount < 10)
+				udelay(SYNCHRONIZE_SRCU_READER_DELAY);
+			else
+				schedule_timeout_interruptible(1);
+		}
+	}
+
+	/*
+	 * The following smp_mb() E pairs with srcu_read_unlock()'s
+	 * smp_mb C to ensure that if srcu_readers_active_idx_check()
+	 * sees srcu_read_unlock()'s counter decrement, then any
+	 * of the current task's subsequent code will happen after
+	 * that SRCU read-side critical section.
+	 *
+	 * It also ensures the order between the above waiting and
+	 * the next flipping.
+	 */
+	smp_mb(); /* E */
+}
+
 /*
  * Flip the readers' index by incrementing ->completed, then wait
  * until there are no more readers using the counters referenced by
@@ -258,12 +289,12 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
  * Of course, it is possible that a reader might be delayed for the
  * full duration of flip_idx_and_wait() between fetching the
  * index and incrementing its counter.  This possibility is handled
- * by __synchronize_srcu() invoking flip_idx_and_wait() twice.
+ * by the next __synchronize_srcu() invoking wait_idx() for such readers
+ * before start a new grace perioad.
  */
 static void flip_idx_and_wait(struct srcu_struct *sp, bool expedited)
 {
 	int idx;
-	int trycount = 0;
 
 	idx = sp->completed++ & 0x1;
 
@@ -278,28 +309,7 @@ static void flip_idx_and_wait(struct srcu_struct *sp, bool expedited)
 	 */
 	smp_mb(); /* D */
 
-	/*
-	 * SRCU read-side critical sections are normally short, so wait
-	 * a small amount of time before possibly blocking.
-	 */
-	if (!srcu_readers_active_idx_check(sp, idx)) {
-		udelay(SYNCHRONIZE_SRCU_READER_DELAY);
-		while (!srcu_readers_active_idx_check(sp, idx)) {
-			if (expedited && ++ trycount < 10)
-				udelay(SYNCHRONIZE_SRCU_READER_DELAY);
-			else
-				schedule_timeout_interruptible(1);
-		}
-	}
-
-	/*
-	 * The following smp_mb() E pairs with srcu_read_unlock()'s
-	 * smp_mb C to ensure that if srcu_readers_active_idx_check()
-	 * sees srcu_read_unlock()'s counter decrement, then any
-	 * of the current task's subsequent code will happen after
-	 * that SRCU read-side critical section.
-	 */
-	smp_mb(); /* E */
+	wait_idx(sp, idx, expedited);
 }
 
 /*
@@ -307,8 +317,6 @@ static void flip_idx_and_wait(struct srcu_struct *sp, bool expedited)
  */
 static void __synchronize_srcu(struct srcu_struct *sp, bool expedited)
 {
-	int idx = 0;
-
 	rcu_lockdep_assert(!lock_is_held(&sp->dep_map) &&
 			   !lock_is_held(&rcu_bh_lock_map) &&
 			   !lock_is_held(&rcu_lock_map) &&
@@ -318,27 +326,42 @@ static void __synchronize_srcu(struct srcu_struct *sp, bool expedited)
 	mutex_lock(&sp->mutex);
 
 	/*
-	 * If there were no helpers, then we need to do two flips of
-	 * the index.  The first flip is required if there are any
-	 * outstanding SRCU readers even if there are no new readers
-	 * running concurrently with the first counter flip.
-	 *
-	 * The second flip is required when a new reader picks up
+	 * When in the previous grace perioad, if a reader picks up
 	 * the old value of the index, but does not increment its
 	 * counter until after its counters is summed/rechecked by
-	 * srcu_readers_active_idx_check().  In this case, the current SRCU
+	 * srcu_readers_active_idx_check(). In this case, the previous SRCU
 	 * grace period would be OK because the SRCU read-side critical
-	 * section started after this SRCU grace period started, so the
+	 * section started after the SRCU grace period started, so the
 	 * grace period is not required to wait for the reader.
 	 *
-	 * However, the next SRCU grace period would be waiting for the
-	 * other set of counters to go to zero, and therefore would not
-	 * wait for the reader, which would be very bad.  To avoid this
-	 * bad scenario, we flip and wait twice, clearing out both sets
-	 * of counters.
+	 * However, such leftover readers affect this new SRCU grace period.
+	 * So we have to wait for such readers. This wait_idx() should be
+	 * considerred as the wait_idx() in the flip_idx_and_wait() of
+	 * the previous grace perioad except that it is for leftover readers
+	 * started before this synchronize_srcu(). So when it returns,
+	 * there is no leftover readers that starts before this grace period.
+	 *
+	 * If there are some leftover readers that do not increment its
+	 * counter until after its counters is summed/rechecked by
+	 * srcu_readers_active_idx_check(), In this case, this SRCU
+	 * grace period would be OK as above comments says. We defines
+	 * such readers as leftover-leftover readers, we consider these
+	 * readers fteched index of (sp->completed + 1), it means they
+	 * are considerred as exactly the same as the readers after this
+	 * grace period.
+	 *
+	 * wait_idx() is expected very fast, because leftover readers
+	 * are unlikely produced.
 	 */
-	for (; idx < 2; idx++)
-		flip_idx_and_wait(sp, expedited);
+	wait_idx(sp, (sp->completed - 1) & 0x1, expedited);
+
+	/*
+	 * Starts a new grace period, this flip is required if there are
+	 * any outstanding SRCU readers even if there are no new readers
+	 * running concurrently with the counter flip.
+	 */
+	flip_idx_and_wait(sp, expedited);
+
 	mutex_unlock(&sp->mutex);
 }