From patchwork Tue Oct 30 16:54:34 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
X-Patchwork-Id: 12603
Return-Path: <patch+caf_=linaro-patchwork=canonical.com@linaro.org>
X-Original-To: patchwork@peony.canonical.com
Delivered-To: patchwork@peony.canonical.com
Received: from fiordland.canonical.com (fiordland.canonical.com
 [91.189.94.145])
 by peony.canonical.com (Postfix) with ESMTP id 3314123FBD
 for <patchwork@peony.canonical.com>;
 Tue, 30 Oct 2012 16:55:01 +0000 (UTC)
Received: from mail-ia0-f180.google.com (mail-ia0-f180.google.com
 [209.85.210.180])
 by fiordland.canonical.com (Postfix) with ESMTP id D6ED8A188A5
 for <linaro-patchwork@canonical.com>;
 Tue, 30 Oct 2012 16:55:00 +0000 (UTC)
Received: by mail-ia0-f180.google.com with SMTP id f6so321320iag.11
 for <linaro-patchwork@canonical.com>;
 Tue, 30 Oct 2012 09:55:00 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf:from:to:cc
 :subject:date:message-id:x-mailer:in-reply-to:references
 :x-content-scanned:x-cbid:x-gm-message-state;
 bh=zM74jOVMt9EafG5HUFpyAKyBKGRDcmc9j8FcjuTMDmU=;
 b=aHS7QwTgOTizCaXu+O0xkAv3Ksj7JA7ZWjLevi/mVsp5rMzOA1CgY6wQvFtW+Npt1L
 Gg5VO77ZVQM8fIU4eMniwf+M+fJbxOpPGMPba9oCLGWStzQB+K4eDoCP7pLmjuW/rBQj
 YG1urWOTwg4xshKOa+dXnmtgFCipBsF5WJSzfbXuDEl199Qhwh62biKEvTFz+3988sju
 M4JPGDHH3imAZwRrD/HkolXvckQkXqbS3IBNuK1VHPU80TQRQCwxy7kceBse+fH1/6xe
 5rpuWLnJhT/L0E3ZTv1leKiD6KenL2UiIbexabwolZUPToSFPezI+g0qmSuzR5QUHRvt
 fOMA==
Received: by 10.50.161.169 with SMTP id xt9mr2047670igb.62.1351616100566;
 Tue, 30 Oct 2012 09:55:00 -0700 (PDT)
X-Forwarded-To: linaro-patchwork@canonical.com
X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com
Delivered-To: patches@linaro.org
Received: by 10.50.67.148 with SMTP id n20csp449853igt;
 Tue, 30 Oct 2012 09:54:59 -0700 (PDT)
Received: by 10.50.152.231 with SMTP id vb7mr2174755igb.1.1351616099730;
 Tue, 30 Oct 2012 09:54:59 -0700 (PDT)
Received: from e34.co.us.ibm.com (e34.co.us.ibm.com. [32.97.110.152])
 by mx.google.com with ESMTPS id
 ig4si1224488igc.67.2012.10.30.09.54.59
 (version=TLSv1/SSLv3 cipher=OTHER);
 Tue, 30 Oct 2012 09:54:59 -0700 (PDT)
Received-SPF: pass (google.com: domain of paulmck@linux.vnet.ibm.com
 designates 32.97.110.152 as permitted sender)
 client-ip=32.97.110.152; 
Authentication-Results: mx.google.com; spf=pass (google.com: domain of
 paulmck@linux.vnet.ibm.com designates 32.97.110.152 as
 permitted sender) smtp.mail=paulmck@linux.vnet.ibm.com
Received: from /spool/local
 by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
 Only! Violators will be prosecuted
 for <patches@linaro.org> from <paulmck@linux.vnet.ibm.com>;
 Tue, 30 Oct 2012 10:54:59 -0600
Received: from d03dlp03.boulder.ibm.com (9.17.202.179)
 by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway:
 Authorized Use Only! Violators will be prosecuted; 
 Tue, 30 Oct 2012 10:54:57 -0600
Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com
 [9.17.195.107])
 by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id D45F619D8042;
 Tue, 30 Oct 2012 10:54:51 -0600 (MDT)
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167])
 by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
 q9UGskF0221952; Tue, 30 Oct 2012 10:54:49 -0600
Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1])
 by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
 id q9UGsfbJ028505; Tue, 30 Oct 2012 10:54:42 -0600
Received: from paulmck-ThinkPad-W500 (sig-9-65-77-17.mts.ibm.com [9.65.77.17])
 by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP
 id q9UGsce1028228; Tue, 30 Oct 2012 10:54:40 -0600
Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000)
 id CF0DBEBED9; Tue, 30 Oct 2012 09:54:37 -0700 (PDT)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: mingo@elte.hu, laijs@cn.fujitsu.com, dipankar@in.ibm.com,
 akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
 josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
 peterz@infradead.org, rostedt@goodmis.org, Valdis.Kletnieks@vt.edu,
 dhowells@redhat.com, edumazet@google.com, darren@dvhart.com,
 fweisbec@gmail.com, sbw@mit.edu, patches@linaro.org,
 "Paul E. McKenney" <paul.mckenney@linaro.org>,
 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: [PATCH tip/core/rcu 5/7] rcu: Avoid counter wrap in
 synchronize_sched_expedited()
Date: Tue, 30 Oct 2012 09:54:34 -0700
Message-Id: <1351616076-25617-5-git-send-email-paulmck@linux.vnet.ibm.com>
X-Mailer: git-send-email 1.7.8
In-Reply-To: <1351616076-25617-1-git-send-email-paulmck@linux.vnet.ibm.com>
References: <20121030165415.GA25438@linux.vnet.ibm.com>
 <1351616076-25617-1-git-send-email-paulmck@linux.vnet.ibm.com>
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12103016-2876-0000-0000-0000018190EA
X-Gm-Message-State: ALoCoQn440/TBrMEMn10WmLsocEyaykhhdC4Bc+QHqLpZJjFInkuVUmrVvlPoh27JrAKs1v0A2FV

From: "Paul E. McKenney" <paul.mckenney@linaro.org>

There is a counter scheme similar to ticket locking that
synchronize_sched_expedited() uses to service multiple concurrent
callers with the same expedited grace period.  Upon entry, a
sync_sched_expedited_started variable is atomically incremented,
and upon completion of a expedited grace period a separate
sync_sched_expedited_done variable is atomically incremented.

However, if a synchronize_sched_expedited() is delayed while
in try_stop_cpus(), concurrent invocations will increment the
sync_sched_expedited_started counter, which will eventually overflow.
If the original synchronize_sched_expedited() resumes execution just
as the counter overflows, a concurrent invocation could incorrectly
conclude that an expedited grace period elapsed in zero time, which
would be bad.  One could rely on counter size to prevent this from
happening in practice, but the goal is to formally validate this
code, so it needs to be fixed anyway.

This commit therefore checks the gap between the two counters before
incrementing sync_sched_expedited_started, and if the gap is too
large, does a normal grace period instead.  Overflow is thus only
possible if there are more than about 3.5 billion threads on 32-bit
systems, which can be excluded until such time as task_struct fits
into a single byte and 4G/4G patches are accepted into mainline.
It is also easy to encode this limitation into mechanical theorem
provers.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c |   62 ++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 8914886..6789055 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2249,8 +2249,8 @@ void synchronize_rcu_bh(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
 
-static atomic_t sync_sched_expedited_started = ATOMIC_INIT(0);
-static atomic_t sync_sched_expedited_done = ATOMIC_INIT(0);
+static atomic_long_t sync_sched_expedited_started = ATOMIC_LONG_INIT(0);
+static atomic_long_t sync_sched_expedited_done = ATOMIC_LONG_INIT(0);
 
 static int synchronize_sched_expedited_cpu_stop(void *data)
 {
@@ -2308,10 +2308,30 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
  */
 void synchronize_sched_expedited(void)
 {
-	int firstsnap, s, snap, trycount = 0;
+	long firstsnap, s, snap;
+	int trycount = 0;
 
-	/* Note that atomic_inc_return() implies full memory barrier. */
-	firstsnap = snap = atomic_inc_return(&sync_sched_expedited_started);
+	/*
+	 * If we are in danger of counter wrap, just do synchronize_sched().
+	 * By allowing sync_sched_expedited_started to advance no more than
+	 * ULONG_MAX/8 ahead of sync_sched_expedited_done, we are ensuring
+	 * that more than 3.5 billion CPUs would be required to force a
+	 * counter wrap on a 32-bit system.  Quite a few more CPUs would of
+	 * course be required on a 64-bit system.
+	 */
+	if (ULONG_CMP_GE((ulong)atomic_read(&sync_sched_expedited_started),
+			 (ulong)atomic_read(&sync_sched_expedited_done) +
+			 ULONG_MAX / 8)) {
+		synchronize_sched();
+		return;
+	}
+
+	/*
+	 * Take a ticket.  Note that atomic_inc_return() implies a
+	 * full memory barrier.
+	 */
+	snap = atomic_long_inc_return(&sync_sched_expedited_started);
+	firstsnap = snap;
 	get_online_cpus();
 	WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id()));
 
@@ -2324,6 +2344,13 @@ void synchronize_sched_expedited(void)
 			     NULL) == -EAGAIN) {
 		put_online_cpus();
 
+		/* Check to see if someone else did our work for us. */
+		s = atomic_long_read(&sync_sched_expedited_done);
+		if (ULONG_CMP_GE((ulong)s, (ulong)firstsnap)) {
+			smp_mb(); /* ensure test happens before caller kfree */
+			return;
+		}
+
 		/* No joy, try again later.  Or just synchronize_sched(). */
 		if (trycount++ < 10) {
 			udelay(trycount * num_online_cpus());
@@ -2332,23 +2359,22 @@ void synchronize_sched_expedited(void)
 			return;
 		}
 
-		/* Check to see if someone else did our work for us. */
-		s = atomic_read(&sync_sched_expedited_done);
-		if (UINT_CMP_GE((unsigned)s, (unsigned)firstsnap)) {
+		/* Recheck to see if someone else did our work for us. */
+		s = atomic_long_read(&sync_sched_expedited_done);
+		if (ULONG_CMP_GE((ulong)s, (ulong)firstsnap)) {
 			smp_mb(); /* ensure test happens before caller kfree */
 			return;
 		}
 
 		/*
 		 * Refetching sync_sched_expedited_started allows later
-		 * callers to piggyback on our grace period.  We subtract
-		 * 1 to get the same token that the last incrementer got.
-		 * We retry after they started, so our grace period works
-		 * for them, and they started after our first try, so their
-		 * grace period works for us.
+		 * callers to piggyback on our grace period.  We retry
+		 * after they started, so our grace period works for them,
+		 * and they started after our first try, so their grace
+		 * period works for us.
 		 */
 		get_online_cpus();
-		snap = atomic_read(&sync_sched_expedited_started);
+		snap = atomic_long_read(&sync_sched_expedited_started);
 		smp_mb(); /* ensure read is before try_stop_cpus(). */
 	}
 
@@ -2356,15 +2382,15 @@ void synchronize_sched_expedited(void)
 	 * Everyone up to our most recent fetch is covered by our grace
 	 * period.  Update the counter, but only if our work is still
 	 * relevant -- which it won't be if someone who started later
-	 * than we did beat us to the punch.
+	 * than we did already did their update.
 	 */
 	do {
-		s = atomic_read(&sync_sched_expedited_done);
-		if (UINT_CMP_GE((unsigned)s, (unsigned)snap)) {
+		s = atomic_long_read(&sync_sched_expedited_done);
+		if (ULONG_CMP_GE((ulong)s, (ulong)snap)) {
 			smp_mb(); /* ensure test happens before caller kfree */
 			break;
 		}
-	} while (atomic_cmpxchg(&sync_sched_expedited_done, s, snap) != s);
+	} while (atomic_long_cmpxchg(&sync_sched_expedited_done, s, snap) != s);
 
 	put_online_cpus();
 }