From patchwork Wed Jan 21 16:53:56 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Thompson <daniel.thompson@linaro.org>
X-Patchwork-Id: 43471
Return-Path: <patchwork-forward+bncBCAPDLF44QLBBPVT76SQKGQEJO5CB2Q@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-ee0-f69.google.com (mail-ee0-f69.google.com [74.125.83.69])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id B7B0E218DB
 for <linaro@patches.linaro.org>; Wed, 21 Jan 2015 16:54:23 +0000 (UTC)
Received: by mail-ee0-f69.google.com with SMTP id b57sf9584514eek.0
 for <linaro@patches.linaro.org>; Wed, 21 Jan 2015 08:54:22 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject
 :date:message-id:x-original-sender:x-original-authentication-results
 :precedence:mailing-list:list-id:list-post:list-help:list-archive
 :list-unsubscribe;
 bh=q75uNuxyfCQkf9yNIFAWaEQ+xCSTg4kIMXqw25Gw3UY=;
 b=fVe0o6iocva3zGWMVvKumcXfYeh52buxCjHzwuoczd9w/3NKW1S4UUkKXIz+JoXzp8
 OIPy+fZRzel8OvxZ0Wc+Qa7bYn9KMzXYtlgwA5lQ+sly6OeRooXS6fjYOzVXwRpkwCdA
 RC77dVqythatxVeoxHqO+U/TfpIlqZAgJKmyUGfm2fsgmO1NVS6WvtcUQb3MRDORBnYa
 XJ5o0LIcFjLCt2DjvqLDvz4CwQbpEs/4rl3zmiJWX10uMUjcaCm5excpCfb1BNxictbU
 VMTdOafFmZY87ajuaCBGlsgK5miEXWAl1iGN1tur2LbIY4loVNItoxkktLjgbfMOb75X
 f//A==
X-Gm-Message-State: ALoCoQlpVpaBjSMwR7XnexmtgO1EEd4YYxDqFiGTIIVy6t9OldUT1aN67JF912e0zRFUUHrzW+Kq
X-Received: by 10.152.43.50 with SMTP id t18mr1314325lal.3.1421859262828;
 Wed, 21 Jan 2015 08:54:22 -0800 (PST)
MIME-Version: 1.0
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.152.87.50 with SMTP id u18ls58309laz.96.gmail; Wed, 21 Jan
 2015 08:54:22 -0800 (PST)
X-Received: by 10.112.47.135 with SMTP id d7mr45218541lbn.54.1421859262619; 
 Wed, 21 Jan 2015 08:54:22 -0800 (PST)
Received: from mail-lb0-f174.google.com (mail-lb0-f174.google.com.
 [209.85.217.174])
 by mx.google.com with ESMTPS id s8si2864276laj.69.2015.01.21.08.54.22
 for <patchwork-forward@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 21 Jan 2015 08:54:22 -0800 (PST)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 209.85.217.174 as permitted sender) client-ip=209.85.217.174; 
Received: by mail-lb0-f174.google.com with SMTP id f15so1589829lbj.5
 for <patchwork-forward@linaro.org>;
 Wed, 21 Jan 2015 08:54:22 -0800 (PST)
X-Received: by 10.152.6.132 with SMTP id b4mr46008992laa.59.1421859262535;
 Wed, 21 Jan 2015 08:54:22 -0800 (PST)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patches@linaro.org
Received: by 10.112.9.200 with SMTP id c8csp1841414lbb;
 Wed, 21 Jan 2015 08:54:22 -0800 (PST)
X-Received: by 10.112.61.228 with SMTP id t4mr4470295lbr.0.1421859262056;
 Wed, 21 Jan 2015 08:54:22 -0800 (PST)
Received: from mail-wg0-f44.google.com (mail-wg0-f44.google.com.
 [74.125.82.44]) by mx.google.com with ESMTPS id
 b6si12811570wik.65.2015.01.21.08.54.21 for <patches@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 21 Jan 2015 08:54:21 -0800 (PST)
Received-SPF: pass (google.com: domain of daniel.thompson@linaro.org
 designates 74.125.82.44 as permitted sender)
 client-ip=74.125.82.44; 
Received: by mail-wg0-f44.google.com with SMTP id z12so7265236wgg.3
 for <patches@linaro.org>; Wed, 21 Jan 2015 08:54:21 -0800 (PST)
X-Received: by 10.180.73.101 with SMTP id k5mr57905968wiv.43.1421859261440; 
 Wed, 21 Jan 2015 08:54:21 -0800 (PST)
Received: from sundance.lan (cpc4-aztw19-0-0-cust157.18-1.cable.virginm.net.
 [82.33.25.158]) by mx.google.com with ESMTPSA id
 b10sm481073wjr.32.2015.01.21.08.54.19
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 21 Jan 2015 08:54:20 -0800 (PST)
From: Daniel Thompson <daniel.thompson@linaro.org>
To: John Stultz <john.stultz@linaro.org>
Cc: Daniel Thompson <daniel.thompson@linaro.org>,
 linux-kernel@vger.kernel.org, patches@linaro.org,
 linaro-kernel@lists.linaro.org, Sumit Semwal <sumit.semwal@linaro.org>,
 Thomas Gleixner <tglx@linutronix.de>,
 Stephen Boyd <sboyd@codeaurora.org>, Steven Rostedt <rostedt@goodmis.org>
Subject: [RFC PATCH] sched_clock: Avoid tearing during read from NMI
Date: Wed, 21 Jan 2015 16:53:56 +0000
Message-Id: <1421859236-19782-1-git-send-email-daniel.thompson@linaro.org>
X-Mailer: git-send-email 1.9.3
X-Removed-Original-Auth: Dkim didn't pass.
X-Original-Sender: daniel.thompson@linaro.org
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 209.85.217.174 as permitted sender)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org
Precedence: list
Mailing-list: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
List-ID: <patchwork-forward.linaro.org>
X-Google-Group-Id: 836684582541
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>

Currently it is possible for an NMI (or FIQ on ARM) to come in and
read sched_clock() whilst update_sched_clock() has half updated the
state. This results in a bad time value being observed.

This patch fixes that problem in a similar manner to Thomas Gleixner's
4396e058c52e("timekeeping: Provide fast and NMI safe access to
CLOCK_MONOTONIC").

Note that ripping out the seqcount lock from sched_clock_register() and
replacing it with a large comment is not nearly as bad as it looks! The
locking here is actually pretty useless since most of the variables
modified within the write lock are not covered by the read lock. As a
result a big comment and the sequence bump implicit in the call
to update_epoch() should work pretty much the same.

Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
---

Notes:
    This patch has only had fairly light testing at this point. However it
    survives basic tests. In particular I am running perf from FIQ/NMI and
    have instrumented it with some monotonicity tests none of which have
    reported any problem.
    

 kernel/time/sched_clock.c | 63 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 50 insertions(+), 13 deletions(-)

--
1.9.3

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 01d2d15aa662..485d5070259c 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -27,6 +27,10 @@ struct clock_data {
 	u32 mult;
 	u32 shift;
 	bool suspended;
+
+	/* Used only temporarily whilst we are updating the primary copy */
+	u64 old_epoch_ns;
+	u64 old_epoch_cyc;
 };

 static struct hrtimer sched_clock_timer;
@@ -67,9 +71,14 @@ unsigned long long notrace sched_clock(void)
 		return cd.epoch_ns;

 	do {
-		seq = raw_read_seqcount_begin(&cd.seq);
-		epoch_cyc = cd.epoch_cyc;
-		epoch_ns = cd.epoch_ns;
+		seq = raw_read_seqcount(&cd.seq);
+		if (likely(0 == (seq & 1))) {
+			epoch_cyc = cd.epoch_cyc;
+			epoch_ns = cd.epoch_ns;
+		} else {
+			epoch_cyc = cd.old_epoch_cyc;
+			epoch_ns = cd.old_epoch_ns;
+		}
 	} while (read_seqcount_retry(&cd.seq, seq));

 	cyc = read_sched_clock();
@@ -78,6 +87,35 @@ unsigned long long notrace sched_clock(void)
 }

 /*
+ * Update the epoch without allowing sched_clock to observe
+ * a mismatched epoch pair even if called from NMI.
+ *
+ * We do this by maintaining and odd/even copy of the epoch data and
+ * steering sched_clock to one or the other using a sequence counter.
+ * In order to preserve the (average case) data cache profile of
+ * sched_clock the system reverts back to the even copy as soon as
+ * possible; the odd copy is used *only* during an update.
+ *
+ * The caller is responsible for avoiding simultaneous updates.
+ */
+static void notrace update_epoch(u64 cyc, u64 ns)
+{
+	/* Update the backup copy */
+	cd.old_epoch_cyc = cd.epoch_cyc;
+	cd.old_epoch_ns = cd.epoch_ns;
+
+	/* Force readers to use the backup (odd) copy */
+	raw_write_seqcount_latch(&cd.seq);
+
+	/* Update the primary copy */
+	cd.epoch_cyc = cyc;
+	cd.epoch_ns = ns;
+
+	/* Steer readers back the primary (even) copy */
+	raw_write_seqcount_latch(&cd.seq);
+}
+
+/*
  * Atomically update the sched_clock epoch.
  */
 static void notrace update_sched_clock(void)
@@ -91,12 +129,7 @@ static void notrace update_sched_clock(void)
 		cyc_to_ns((cyc - cd.epoch_cyc) & sched_clock_mask,
 			  cd.mult, cd.shift);

-	raw_local_irq_save(flags);
-	raw_write_seqcount_begin(&cd.seq);
-	cd.epoch_ns = ns;
-	cd.epoch_cyc = cyc;
-	raw_write_seqcount_end(&cd.seq);
-	raw_local_irq_restore(flags);
+	update_epoch(cyc, ns);
 }

 static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt)
@@ -135,16 +168,20 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 	ns = cd.epoch_ns + cyc_to_ns((cyc - cd.epoch_cyc) & sched_clock_mask,
 			  cd.mult, cd.shift);

-	raw_write_seqcount_begin(&cd.seq);
+	/*
+	 * sched_clock will report a bad value if it executes
+	 * concurrently with the following code. No locking exists to
+	 * prevent this; we rely mostly on this function being called
+	 * early during kernel boot up before we have lots of other
+	 * stuff going on.
+	 */
 	read_sched_clock = read;
 	sched_clock_mask = new_mask;
 	cd.rate = rate;
 	cd.wrap_kt = new_wrap_kt;
 	cd.mult = new_mult;
 	cd.shift = new_shift;
-	cd.epoch_cyc = new_epoch;
-	cd.epoch_ns = ns;
-	raw_write_seqcount_end(&cd.seq);
+	update_epoch(new_epoch, ns);

 	r = rate;
 	if (r >= 4000000) {