From patchwork Sat May 17 00:56:42 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <john.stultz@linaro.org>
X-Patchwork-Id: 30347
Return-Path: <patchwork-forward+bncBCJ7RPMX4EMBB57H3KNQKGQEOGVMA6A@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-oa0-f70.google.com (mail-oa0-f70.google.com
 [209.85.219.70])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id C0B3B202E4
 for <linaro@patches.linaro.org>; Sat, 17 May 2014 00:57:27 +0000 (UTC)
Received: by mail-oa0-f70.google.com with SMTP id i4sf16733302oah.1
 for <linaro@patches.linaro.org>; Fri, 16 May 2014 17:57:27 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject
 :date:message-id:in-reply-to:references:x-original-sender
 :x-original-authentication-results:precedence:mailing-list:list-id
 :list-post:list-help:list-archive:list-unsubscribe;
 bh=QnRDxscadf6q6Hxq4V5sFNh8jfQcGfXOqhhlwF37c6A=;
 b=f/dysrTYy8QWqpaQhZYp5Ygr6zuV90Z6Shqa6LA3MZ5OLGOVq70mfdWu6nrVpKx6EV
 yh2MM4FCVuCnJ8RgxRuSxh+2qwFuBFrfyh4HO75S4IELVHZZq7cZTmf5bOtw3o8Rih4U
 GQZ6xvTmXPFlidi3XT9HmZIzx/FczsB53TEI/staeRmseCDMlH36jp2b/zp5hoAWLJCA
 M2/sjCQJ+ixM+01cOvQTamriBgmGkgBSNNCzU7o93j9wi9aW0uaCHEfU47cQlPkBzjsY
 mDl3qJJtvxsyaZZpNgRqLYiGVJX1CdCu1YrLKJWHHQI/eh0Yc1tYTu6e2wM4MN2EEQaa
 POag==
X-Gm-Message-State: ALoCoQkYoFjwhdcbv2RSSaUUZR+2/wPUPdvUQjVFIh1qUuwQ0LmJ8lXk9K9dAwxlo8aBkQ70MIDa
X-Received: by 10.182.91.79 with SMTP id cc15mr10245245obb.13.1400288247416; 
 Fri, 16 May 2014 17:57:27 -0700 (PDT)
MIME-Version: 1.0
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.140.84.239 with SMTP id l102ls944203qgd.81.gmail; Fri, 16 May
 2014 17:57:27 -0700 (PDT)
X-Received: by 10.220.162.6 with SMTP id t6mr71689vcx.12.1400288247339;
 Fri, 16 May 2014 17:57:27 -0700 (PDT)
Received: from mail-ve0-f179.google.com (mail-ve0-f179.google.com
 [209.85.128.179]) by mx.google.com with ESMTPS id
 ui2si2061289vdc.82.2014.05.16.17.57.27
 for <patchwork-forward@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 16 May 2014 17:57:27 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 209.85.128.179 as permitted sender) client-ip=209.85.128.179; 
Received: by mail-ve0-f179.google.com with SMTP id oy12so4044533veb.10
 for <patchwork-forward@linaro.org>;
 Fri, 16 May 2014 17:57:27 -0700 (PDT)
X-Received: by 10.58.201.5 with SMTP id jw5mr16548383vec.6.1400288247253;
 Fri, 16 May 2014 17:57:27 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patches@linaro.org
Received: by 10.220.221.72 with SMTP id ib8csp95079vcb;
 Fri, 16 May 2014 17:57:26 -0700 (PDT)
X-Received: by 10.66.180.141 with SMTP id do13mr25089390pac.93.1400288245825; 
 Fri, 16 May 2014 17:57:25 -0700 (PDT)
Received: from mail-pb0-f50.google.com (mail-pb0-f50.google.com
 [209.85.160.50]) by mx.google.com with ESMTPS id
 gh1si10974710pac.147.2014.05.16.17.57.25 for <patches@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 16 May 2014 17:57:25 -0700 (PDT)
Received-SPF: pass (google.com: domain of john.stultz@linaro.org designates
 209.85.160.50 as permitted sender) client-ip=209.85.160.50; 
Received: by mail-pb0-f50.google.com with SMTP id ma3so3294594pbc.9
 for <patches@linaro.org>; Fri, 16 May 2014 17:57:25 -0700 (PDT)
X-Received: by 10.66.102.74 with SMTP id fm10mr25613467pab.86.1400288245382; 
 Fri, 16 May 2014 17:57:25 -0700 (PDT)
Received: from localhost.localdomain (c-67-170-153-23.hsd1.or.comcast.net.
 [67.170.153.23]) by mx.google.com with ESMTPSA id
 ph1sm40567436pac.14.2014.05.16.17.57.24 for <multiple recipients>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Fri, 16 May 2014 17:57:24 -0700 (PDT)
From: John Stultz <john.stultz@linaro.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <john.stultz@linaro.org>,
 Miroslav Lichvar <mlichvar@redhat.com>,
 Richard Cochran <richardcochran@gmail.com>,
 Prarit Bhargava <prarit@redhat.com>
Subject: [PATCH 1/3] [RFC] timekeeping: Rework frequency adjustments to work
 better w/ nohz
Date: Fri, 16 May 2014 17:56:42 -0700
Message-Id: <1400288204-414-2-git-send-email-john.stultz@linaro.org>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1400288204-414-1-git-send-email-john.stultz@linaro.org>
References: <1400288204-414-1-git-send-email-john.stultz@linaro.org>
X-Removed-Original-Auth: Dkim didn't pass.
X-Original-Sender: john.stultz@linaro.org
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 209.85.128.179 as permitted sender)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org
Precedence: list
Mailing-list: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
List-ID: <patchwork-forward.linaro.org>
X-Google-Group-Id: 836684582541
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Unsubscribe: <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>, 
 <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>

The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.

Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.

The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.

Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.

1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct it so it doesn't cause
much accumulated error.

2) Then make a small single unit adjustment to correct any cumulative
error that has collected over time.

This method performs fairly well in the simulator Miroslav created.

Major credit to Miroslav for pointing out the issue, providing the
original patch to resolve this, a simulator for testing, as well as
helping debug and resolve issues in my implementation so that it
performed closer to his original implementation.

I'd be very interested in feedback, thoughts, and testing.

Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/timekeeper_internal.h |   3 +
 kernel/time/timekeeping.c           | 193 ++++++++++++++++--------------------
 2 files changed, 86 insertions(+), 110 deletions(-)
diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index c1825eb..4d398f1 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -41,6 +41,9 @@ struct timekeeper {
 	 * ntp shifted nano seconds. */
 	u32			ntp_error_shift;
 
+	/* Mult adjustment being applied to correct ntp_error */
+	u32			ntp_err_mult;
+
 	/*
 	 * wall_to_monotonic is what we need to add to xtime (or xtime corrected
 	 * for sub jiffie times) to get to monotonic time.  Monotonic is pegged
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 63f9ed7..d46bda2 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -148,6 +148,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	 * to counteract clock drifting.
 	 */
 	tk->mult = clock->mult;
+	tk->ntp_err_mult = 0;
 }
 
 /* Timekeeper helper functions. */
@@ -1049,125 +1050,34 @@ static int __init timekeeping_init_ops(void)
 	register_syscore_ops(&timekeeping_syscore_ops);
 	return 0;
 }
-
 device_initcall(timekeeping_init_ops);
 
 /*
- * If the error is already larger, we look ahead even further
- * to compensate for late or lost adjustments.
+ * Apply a multiplier adjustment to the timekeeper
  */
-static __always_inline int timekeeping_bigadjust(struct timekeeper *tk,
-						 s64 error, s64 *interval,
-						 s64 *offset)
+static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
+							 s64 offset,
+							 bool negative,
+							 int adj_scale)
 {
-	s64 tick_error, i;
-	u32 look_ahead, adj;
-	s32 error2, mult;
-
-	/*
-	 * Use the current error value to determine how much to look ahead.
-	 * The larger the error the slower we adjust for it to avoid problems
-	 * with losing too many ticks, otherwise we would overadjust and
-	 * produce an even larger error.  The smaller the adjustment the
-	 * faster we try to adjust for it, as lost ticks can do less harm
-	 * here.  This is tuned so that an error of about 1 msec is adjusted
-	 * within about 1 sec (or 2^20 nsec in 2^SHIFT_HZ ticks).
-	 */
-	error2 = tk->ntp_error >> (NTP_SCALE_SHIFT + 22 - 2 * SHIFT_HZ);
-	error2 = abs(error2);
-	for (look_ahead = 0; error2 > 0; look_ahead++)
-		error2 >>= 2;
+	s64 interval = tk->cycle_interval;
+	s32 mult_adj = 1;
 
-	/*
-	 * Now calculate the error in (1 << look_ahead) ticks, but first
-	 * remove the single look ahead already included in the error.
-	 */
-	tick_error = ntp_tick_length() >> (tk->ntp_error_shift + 1);
-	tick_error -= tk->xtime_interval >> 1;
-	error = ((error - tick_error) >> look_ahead) + tick_error;
-
-	/* Finally calculate the adjustment shift value.  */
-	i = *interval;
-	mult = 1;
-	if (error < 0) {
-		error = -error;
-		*interval = -*interval;
-		*offset = -*offset;
-		mult = -1;
+	if (negative) {
+		mult_adj = -mult_adj;
+		interval = -interval;
+		offset  = -offset;
 	}
-	for (adj = 0; error > i; adj++)
-		error >>= 1;
-
-	*interval <<= adj;
-	*offset <<= adj;
-	return mult << adj;
-}
+	mult_adj <<= adj_scale;
+	interval <<= adj_scale;
+	offset <<= adj_scale;
 
-/*
- * Adjust the multiplier to reduce the error value,
- * this is optimized for the most common adjustments of -1,0,1,
- * for other values we can do a bit more work.
- */
-static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
-{
-	s64 error, interval = tk->cycle_interval;
-	int adj;
-
-	/*
-	 * The point of this is to check if the error is greater than half
-	 * an interval.
-	 *
-	 * First we shift it down from NTP_SHIFT to clocksource->shifted nsecs.
-	 *
-	 * Note we subtract one in the shift, so that error is really error*2.
-	 * This "saves" dividing(shifting) interval twice, but keeps the
-	 * (error > interval) comparison as still measuring if error is
-	 * larger than half an interval.
-	 *
-	 * Note: It does not "save" on aggravation when reading the code.
-	 */
-	error = tk->ntp_error >> (tk->ntp_error_shift - 1);
-	if (error > interval) {
-		/*
-		 * We now divide error by 4(via shift), which checks if
-		 * the error is greater than twice the interval.
-		 * If it is greater, we need a bigadjust, if its smaller,
-		 * we can adjust by 1.
-		 */
-		error >>= 2;
-		if (likely(error <= interval))
-			adj = 1;
-		else
-			adj = timekeeping_bigadjust(tk, error, &interval, &offset);
-	} else {
-		if (error < -interval) {
-			/* See comment above, this is just switched for the negative */
-			error >>= 2;
-			if (likely(error >= -interval)) {
-				adj = -1;
-				interval = -interval;
-				offset = -offset;
-			} else {
-				adj = timekeeping_bigadjust(tk, error, &interval, &offset);
-			}
-		} else {
-			goto out_adjust;
-		}
-	}
-
-	if (unlikely(tk->clock->maxadj &&
-		(tk->mult + adj > tk->clock->mult + tk->clock->maxadj))) {
-		printk_once(KERN_WARNING
-			"Adjusting %s more than 11%% (%ld vs %ld)\n",
-			tk->clock->name, (long)tk->mult + adj,
-			(long)tk->clock->mult + tk->clock->maxadj);
-	}
 	/*
 	 * So the following can be confusing.
 	 *
-	 * To keep things simple, lets assume adj == 1 for now.
+	 * To keep things simple, lets assume mult_adj == 1 for now.
 	 *
-	 * When adj != 1, remember that the interval and offset values
+	 * When mult_adj != 1, remember that the interval and offset values
 	 * have been appropriately scaled so the math is the same.
 	 *
 	 * The basic idea here is that we're increasing the multiplier
@@ -1211,12 +1121,76 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 	 *
 	 * XXX - TODO: Doc ntp_error calculation.
 	 */
-	tk->mult += adj;
+	tk->mult += mult_adj;
 	tk->xtime_interval += interval;
 	tk->xtime_nsec -= offset;
 	tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
+}
+
+/*
+ * Calculate the multiplier adjustment needed to match the frequency
+ * specified by NTP
+ */
+static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
+							s64 offset)
+{
+	s64 interval = tk->cycle_interval;
+	s64 xinterval = tk->xtime_interval;
+	s64 tick_error;
+	bool negative;
+	u32 adj;
+
+	/* Remove any current error adj from freq calculation */
+	if (tk->ntp_err_mult)
+		xinterval -= tk->cycle_interval;
+
+	/* Calculate current error per tick */
+	tick_error = ntp_tick_length() >> tk->ntp_error_shift;
+	tick_error -= (xinterval + tk->xtime_remainder);
+
+	/* Don't worry about correcting it if its small */
+	if (likely((tick_error >= 0) && (tick_error <= interval)))
+		return;
+
+	/* preserve the direction of correction */
+	negative = (tick_error < 0);
+
+	/* Sort out the magnitude of the correction */
+	tick_error = abs(tick_error);
+	for (adj = 0; tick_error > interval; adj++)
+		tick_error >>= 1;
+
+	/* scale the corrections */
+	timekeeping_apply_adjustment(tk, offset, negative, adj);
+}
+
+/*
+ * Adjust the timekeeper's multiplier to the correct frequency
+ * and also to reduce the accumulated error value.
+ */
+static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
+{
+	/* Correct for the current frequency error */
+	timekeeping_freqadjust(tk, offset);
+
+	/* Next make a small adjustment to fix any cumulative error */
+	if (!tk->ntp_err_mult && (tk->ntp_error > 0)) {
+		tk->ntp_err_mult = 1;
+		timekeeping_apply_adjustment(tk, offset, 0, 0);
+	} else if (tk->ntp_err_mult && (tk->ntp_error <= 0)) {
+		/* Undo any existing error adjustment */
+		timekeeping_apply_adjustment(tk, offset, 1, 0);
+		tk->ntp_err_mult = 0;
+	}
+
+	if (unlikely(tk->clock->maxadj &&
+		(tk->mult > tk->clock->mult + tk->clock->maxadj))) {
+		printk_once(KERN_WARNING
+			"Adjusting %s more than 11%% (%ld vs %ld)\n",
+			tk->clock->name, (long)tk->mult,
+			(long)tk->clock->mult + tk->clock->maxadj);
+	}
 
-out_adjust:
 	/*
 	 * It may be possible that when we entered this function, xtime_nsec
 	 * was very small.  Further, if we're slightly speeding the clocksource
@@ -1236,7 +1210,6 @@ out_adjust:
 		tk->xtime_nsec = 0;
 		tk->ntp_error += neg << tk->ntp_error_shift;
 	}
-
 }
 
 /**