From patchwork Wed Jan 21 16:53:56 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Thompson X-Patchwork-Id: 43471 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ee0-f69.google.com (mail-ee0-f69.google.com [74.125.83.69]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id B7B0E218DB for ; Wed, 21 Jan 2015 16:54:23 +0000 (UTC) Received: by mail-ee0-f69.google.com with SMTP id b57sf9584514eek.0 for ; Wed, 21 Jan 2015 08:54:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:x-original-sender:x-original-authentication-results :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-unsubscribe; bh=q75uNuxyfCQkf9yNIFAWaEQ+xCSTg4kIMXqw25Gw3UY=; b=fVe0o6iocva3zGWMVvKumcXfYeh52buxCjHzwuoczd9w/3NKW1S4UUkKXIz+JoXzp8 OIPy+fZRzel8OvxZ0Wc+Qa7bYn9KMzXYtlgwA5lQ+sly6OeRooXS6fjYOzVXwRpkwCdA RC77dVqythatxVeoxHqO+U/TfpIlqZAgJKmyUGfm2fsgmO1NVS6WvtcUQb3MRDORBnYa XJ5o0LIcFjLCt2DjvqLDvz4CwQbpEs/4rl3zmiJWX10uMUjcaCm5excpCfb1BNxictbU VMTdOafFmZY87ajuaCBGlsgK5miEXWAl1iGN1tur2LbIY4loVNItoxkktLjgbfMOb75X f//A== X-Gm-Message-State: ALoCoQlpVpaBjSMwR7XnexmtgO1EEd4YYxDqFiGTIIVy6t9OldUT1aN67JF912e0zRFUUHrzW+Kq X-Received: by 10.152.43.50 with SMTP id t18mr1314325lal.3.1421859262828; Wed, 21 Jan 2015 08:54:22 -0800 (PST) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.87.50 with SMTP id u18ls58309laz.96.gmail; Wed, 21 Jan 2015 08:54:22 -0800 (PST) X-Received: by 10.112.47.135 with SMTP id d7mr45218541lbn.54.1421859262619; Wed, 21 Jan 2015 08:54:22 -0800 (PST) Received: from mail-lb0-f174.google.com (mail-lb0-f174.google.com. [209.85.217.174]) by mx.google.com with ESMTPS id s8si2864276laj.69.2015.01.21.08.54.22 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 21 Jan 2015 08:54:22 -0800 (PST) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.174 as permitted sender) client-ip=209.85.217.174; Received: by mail-lb0-f174.google.com with SMTP id f15so1589829lbj.5 for ; Wed, 21 Jan 2015 08:54:22 -0800 (PST) X-Received: by 10.152.6.132 with SMTP id b4mr46008992laa.59.1421859262535; Wed, 21 Jan 2015 08:54:22 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.112.9.200 with SMTP id c8csp1841414lbb; Wed, 21 Jan 2015 08:54:22 -0800 (PST) X-Received: by 10.112.61.228 with SMTP id t4mr4470295lbr.0.1421859262056; Wed, 21 Jan 2015 08:54:22 -0800 (PST) Received: from mail-wg0-f44.google.com (mail-wg0-f44.google.com. [74.125.82.44]) by mx.google.com with ESMTPS id b6si12811570wik.65.2015.01.21.08.54.21 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 21 Jan 2015 08:54:21 -0800 (PST) Received-SPF: pass (google.com: domain of daniel.thompson@linaro.org designates 74.125.82.44 as permitted sender) client-ip=74.125.82.44; Received: by mail-wg0-f44.google.com with SMTP id z12so7265236wgg.3 for ; Wed, 21 Jan 2015 08:54:21 -0800 (PST) X-Received: by 10.180.73.101 with SMTP id k5mr57905968wiv.43.1421859261440; Wed, 21 Jan 2015 08:54:21 -0800 (PST) Received: from sundance.lan (cpc4-aztw19-0-0-cust157.18-1.cable.virginm.net. [82.33.25.158]) by mx.google.com with ESMTPSA id b10sm481073wjr.32.2015.01.21.08.54.19 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 Jan 2015 08:54:20 -0800 (PST) From: Daniel Thompson To: John Stultz Cc: Daniel Thompson , linux-kernel@vger.kernel.org, patches@linaro.org, linaro-kernel@lists.linaro.org, Sumit Semwal , Thomas Gleixner , Stephen Boyd , Steven Rostedt Subject: [RFC PATCH] sched_clock: Avoid tearing during read from NMI Date: Wed, 21 Jan 2015 16:53:56 +0000 Message-Id: <1421859236-19782-1-git-send-email-daniel.thompson@linaro.org> X-Mailer: git-send-email 1.9.3 X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: daniel.thompson@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.174 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Currently it is possible for an NMI (or FIQ on ARM) to come in and read sched_clock() whilst update_sched_clock() has half updated the state. This results in a bad time value being observed. This patch fixes that problem in a similar manner to Thomas Gleixner's 4396e058c52e("timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC"). Note that ripping out the seqcount lock from sched_clock_register() and replacing it with a large comment is not nearly as bad as it looks! The locking here is actually pretty useless since most of the variables modified within the write lock are not covered by the read lock. As a result a big comment and the sequence bump implicit in the call to update_epoch() should work pretty much the same. Suggested-by: Stephen Boyd Signed-off-by: Daniel Thompson --- Notes: This patch has only had fairly light testing at this point. However it survives basic tests. In particular I am running perf from FIQ/NMI and have instrumented it with some monotonicity tests none of which have reported any problem. kernel/time/sched_clock.c | 63 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 50 insertions(+), 13 deletions(-) -- 1.9.3 diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 01d2d15aa662..485d5070259c 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -27,6 +27,10 @@ struct clock_data { u32 mult; u32 shift; bool suspended; + + /* Used only temporarily whilst we are updating the primary copy */ + u64 old_epoch_ns; + u64 old_epoch_cyc; }; static struct hrtimer sched_clock_timer; @@ -67,9 +71,14 @@ unsigned long long notrace sched_clock(void) return cd.epoch_ns; do { - seq = raw_read_seqcount_begin(&cd.seq); - epoch_cyc = cd.epoch_cyc; - epoch_ns = cd.epoch_ns; + seq = raw_read_seqcount(&cd.seq); + if (likely(0 == (seq & 1))) { + epoch_cyc = cd.epoch_cyc; + epoch_ns = cd.epoch_ns; + } else { + epoch_cyc = cd.old_epoch_cyc; + epoch_ns = cd.old_epoch_ns; + } } while (read_seqcount_retry(&cd.seq, seq)); cyc = read_sched_clock(); @@ -78,6 +87,35 @@ unsigned long long notrace sched_clock(void) } /* + * Update the epoch without allowing sched_clock to observe + * a mismatched epoch pair even if called from NMI. + * + * We do this by maintaining and odd/even copy of the epoch data and + * steering sched_clock to one or the other using a sequence counter. + * In order to preserve the (average case) data cache profile of + * sched_clock the system reverts back to the even copy as soon as + * possible; the odd copy is used *only* during an update. + * + * The caller is responsible for avoiding simultaneous updates. + */ +static void notrace update_epoch(u64 cyc, u64 ns) +{ + /* Update the backup copy */ + cd.old_epoch_cyc = cd.epoch_cyc; + cd.old_epoch_ns = cd.epoch_ns; + + /* Force readers to use the backup (odd) copy */ + raw_write_seqcount_latch(&cd.seq); + + /* Update the primary copy */ + cd.epoch_cyc = cyc; + cd.epoch_ns = ns; + + /* Steer readers back the primary (even) copy */ + raw_write_seqcount_latch(&cd.seq); +} + +/* * Atomically update the sched_clock epoch. */ static void notrace update_sched_clock(void) @@ -91,12 +129,7 @@ static void notrace update_sched_clock(void) cyc_to_ns((cyc - cd.epoch_cyc) & sched_clock_mask, cd.mult, cd.shift); - raw_local_irq_save(flags); - raw_write_seqcount_begin(&cd.seq); - cd.epoch_ns = ns; - cd.epoch_cyc = cyc; - raw_write_seqcount_end(&cd.seq); - raw_local_irq_restore(flags); + update_epoch(cyc, ns); } static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt) @@ -135,16 +168,20 @@ void __init sched_clock_register(u64 (*read)(void), int bits, ns = cd.epoch_ns + cyc_to_ns((cyc - cd.epoch_cyc) & sched_clock_mask, cd.mult, cd.shift); - raw_write_seqcount_begin(&cd.seq); + /* + * sched_clock will report a bad value if it executes + * concurrently with the following code. No locking exists to + * prevent this; we rely mostly on this function being called + * early during kernel boot up before we have lots of other + * stuff going on. + */ read_sched_clock = read; sched_clock_mask = new_mask; cd.rate = rate; cd.wrap_kt = new_wrap_kt; cd.mult = new_mult; cd.shift = new_shift; - cd.epoch_cyc = new_epoch; - cd.epoch_ns = ns; - raw_write_seqcount_end(&cd.seq); + update_epoch(new_epoch, ns); r = rate; if (r >= 4000000) {