From patchwork Thu Jun 8 23:44:20 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 103433 Delivered-To: patches@linaro.org Received: by 10.140.91.77 with SMTP id y71csp2677780qgd; Thu, 8 Jun 2017 16:51:06 -0700 (PDT) X-Received: by 10.84.218.71 with SMTP id f7mr35978088plm.180.1496965865972; Thu, 08 Jun 2017 16:51:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1496965865; cv=none; d=google.com; s=arc-20160816; b=JoqMXr6bsVRDK8Kqa71L9YNeCA3jqnagoEWtOrIaOvIsEdvbmi/GmQPXGEZuNQp0P3 j8d0EEnzSj32KpIft4wglfAI2+urhQjvXdP3iUAP2QlPp8AuCRSRN/YvB3RCzoog3Bli BJpewLb8XiJhGm33XBFaB2o3cyUx+hDQjVDh8LpythuMDlxEdSS3809t9SQDEUaDdoif PEczDax+isNp5ZcOrhDMDBVKn+FB5n1CYqDjlT+zTIak7zBF/UzOge7yxAtdBL+O9d1Y fpsF5oTsQULKejoOsi+lPCWaBZpKmAFIUGo3PAok20M5U7SStDlFpcysHRhckwtQ6NEK QHKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=aGYhV/VzR11v/rRzK0sgKVfmyL1iJuV4wjoS7yiENSc=; b=tvC6jnPOTfZSnxFZXHAvEY1+vJEj/ACBOTbOcigzJps9XhJISBQLZlwKAHkoDrXIZv nBQDKbOsb9PdE7VygCS8/3oRTZ45q15CIqpJUP+ZefHrZs/caFrvFYIqsDl3bWqNjys1 yy8ynrVi0l8UPik2n+i4anfp1Ik/t4EAaEMYH2HBzTEzXoj9sKZ98fMT7+oeGciHZHJK ojnT0KkU81jrl68Pc+/LwYLZxDVN9f+cmcoJGB3rH8WydbeWIvcs0+DwUpBeIhocqgWV 8FiB0dZ6HWNNBVj/TGnt6Qj/aI/tjWHPIhQnFfLn/PmD8WMtJPBvJaQoPXYM6hbC2pE1 vv4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org; spf=pass (google.com: domain of john.stultz@linaro.org designates 209.85.220.41 as permitted sender) smtp.mailfrom=john.stultz@linaro.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id g2sor3318315pgr.174.2017.06.08.16.51.05 for (Google Transport Security); Thu, 08 Jun 2017 16:51:05 -0700 (PDT) Received-SPF: pass (google.com: domain of john.stultz@linaro.org designates 209.85.220.41 as permitted sender) client-ip=209.85.220.41; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org; spf=pass (google.com: domain of john.stultz@linaro.org designates 209.85.220.41 as permitted sender) smtp.mailfrom=john.stultz@linaro.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=aGYhV/VzR11v/rRzK0sgKVfmyL1iJuV4wjoS7yiENSc=; b=ZLQT4YpWykqQkuJCfMLuU6fIuX0T9T+3TV3Jr0nSHL5Jiet9lXHyOCaoISu2SHeZdY 4AHE+zuOebvzOeZm/+RPvRb1NsTq/YCpYDwFor3DuI0QyWSNMdjsSkMqOI5hjOaRRpsa R4utDtO0n+xTugv82BKx8QzDyNpdSfVKpE+JA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=aGYhV/VzR11v/rRzK0sgKVfmyL1iJuV4wjoS7yiENSc=; b=j7TvbNMlAdcqQ4b40/InagaZmUc7Hb8UTFbSuOzN1J2RnoHTtlt2+QTJMW3luYD5Z0 mGb0SV/FVlI6/FEbVALzF8dowsYPMLWzFN8Rh0A5VIA+ouRgIKySHdegDMJ4m0k/tWXp HhpYg5f3eNVHN4N77mpShQIHKBPH6e8Qy6cbys68jH/6J+A8DD1iJxdShdScBnLwz1t+ bVcdgvc3Lt3mc7i9hI2R+OjfWisFt47Xz9H47ya0u46j8EyqSFI0cDpiGQtkh5I+Ty8l Hb2pvD+WBmGehqqLmCd6Gmf06hcU4IUAlTkakmT8Kyqiim3LebbcmdULoFqWzbjaFyEM j+ZQ== X-Gm-Message-State: AODbwcBgCi9Qjs2f3hb5pwj+mJis48PzAjzhoHCO6APqaWFkUMXTBcaP wZ8JUx5HEGBOkAGv8KU= X-Received: by 10.99.101.135 with SMTP id z129mr39956114pgb.66.1496965865627; Thu, 08 Jun 2017 16:51:05 -0700 (PDT) Return-Path: Received: from localhost.localdomain ([2601:1c2:1002:83f0:4e72:b9ff:fe99:466a]) by smtp.gmail.com with ESMTPSA id b86sm14462850pfc.27.2017.06.08.16.51.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 08 Jun 2017 16:51:04 -0700 (PDT) From: John Stultz To: lkml Cc: John Stultz , Thomas Gleixner , Ingo Molnar , Miroslav Lichvar , Richard Cochran , Prarit Bhargava , Stephen Boyd , Daniel Mentz , stable Subject: [PATCH 1/3 v3] time: Fix clock->read(clock) race around clocksource changes Date: Thu, 8 Jun 2017 16:44:20 -0700 Message-Id: <1496965462-20003-2-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1496965462-20003-1-git-send-email-john.stultz@linaro.org> References: <1496965462-20003-1-git-send-email-john.stultz@linaro.org> In tests, which excercise switching of clocksources, a NULL pointer dereference can be observed on AMR64 platforms in the clocksource read() function: u64 clocksource_mmio_readl_down(struct clocksource *c) { return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask; } This is called from the core timekeeping code via: cycle_now = tkr->read(tkr->clock); tkr->read is the cached tkr->clock->read() function pointer. When the clocksource is changed then tkr->clock and tkr->read are updated sequentially. The code above results in a sequential load operation of tkr->read and tkr->clock as well. If the store to tkr->clock hits between the loads of tkr->read and tkr->clock, then the old read() function is called with the new clock pointer. As a consequence the read() function dereferences a different data structure and the resulting 'reg' pointer can point anywhere including NULL. This problem was introduced when the timekeeping code was switched over to use struct tk_read_base. Before that, it was theoretically possible as well when the compiler decided to reload clock in the code sequence: now = tk->clock->read(tk->clock); Add a helper function which avoids the issue by reading tk_read_base->clock once into a local variable clk and then issue the read function via clk->read(clk). This guarantees that the read() function always gets the proper clocksource pointer handed in. Since there is now no use for the tkr.read pointer, this patch also removes it, and to address stopping the fast timekeeper during suspend/resume, it introduces a dummy clocksource to use rather then just a dummy read function. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Miroslav Lichvar Cc: Richard Cochran Cc: Prarit Bhargava Cc: Stephen Boyd Cc: Daniel Mentz Cc: stable Acked-by: Ingo Molnar Signed-off-by: John Stultz --- v2: Addressed Ingo's feedback on wording v3: Rework to address similar issue w/ fasttimekeepers, use tglx's suggested commit description. --- include/linux/timekeeper_internal.h | 1 - kernel/time/timekeeping.c | 52 +++++++++++++++++++++++++------------ 2 files changed, 36 insertions(+), 17 deletions(-) -- 2.7.4 diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index 110f453..e9834ad 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -29,7 +29,6 @@ */ struct tk_read_base { struct clocksource *clock; - u64 (*read)(struct clocksource *cs); u64 mask; u64 cycle_last; u32 mult; diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 9652bc5..eff94cb 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -118,6 +118,26 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) tk->offs_boot = ktime_add(tk->offs_boot, delta); } +/* + * tk_clock_read - atomic clocksource read() helper + * + * This helper is necessary to use in the read paths because, while the + * seqlock ensures we don't return a bad value while structures are updated, + * it doesn't protect from potential crashes. There is the possibility that + * the tkr's clocksource may change between the read reference, and the + * clock reference passed to the read function. This can cause crashes if + * the wrong clocksource is passed to the wrong read function. + * This isn't necessary to use when holding the timekeeper_lock or doing + * a read of the fast-timekeeper tkrs (which is protected by its own locking + * and update logic). + */ +static inline u64 tk_clock_read(struct tk_read_base *tkr) +{ + struct clocksource *clock = READ_ONCE(tkr->clock); + + return clock->read(clock); +} + #ifdef CONFIG_DEBUG_TIMEKEEPING #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */ @@ -175,7 +195,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr) */ do { seq = read_seqcount_begin(&tk_core.seq); - now = tkr->read(tkr->clock); + now = tk_clock_read(tkr); last = tkr->cycle_last; mask = tkr->mask; max = tkr->clock->max_cycles; @@ -209,7 +229,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr) u64 cycle_now, delta; /* read clocksource */ - cycle_now = tkr->read(tkr->clock); + cycle_now = tk_clock_read(tkr); /* calculate the delta since the last update_wall_time */ delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); @@ -238,12 +258,10 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock) ++tk->cs_was_changed_seq; old_clock = tk->tkr_mono.clock; tk->tkr_mono.clock = clock; - tk->tkr_mono.read = clock->read; tk->tkr_mono.mask = clock->mask; - tk->tkr_mono.cycle_last = tk->tkr_mono.read(clock); + tk->tkr_mono.cycle_last = tk_clock_read(&tk->tkr_mono); tk->tkr_raw.clock = clock; - tk->tkr_raw.read = clock->read; tk->tkr_raw.mask = clock->mask; tk->tkr_raw.cycle_last = tk->tkr_mono.cycle_last; @@ -404,7 +422,7 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) now += timekeeping_delta_to_ns(tkr, clocksource_delta( - tkr->read(tkr->clock), + tk_clock_read(tkr), tkr->cycle_last, tkr->mask)); } while (read_seqcount_retry(&tkf->seq, seq)); @@ -461,6 +479,10 @@ static u64 dummy_clock_read(struct clocksource *cs) return cycles_at_suspend; } +static struct clocksource dummy_clock = { + .read = dummy_clock_read, +}; + /** * halt_fast_timekeeper - Prevent fast timekeeper from accessing clocksource. * @tk: Timekeeper to snapshot. @@ -477,13 +499,13 @@ static void halt_fast_timekeeper(struct timekeeper *tk) struct tk_read_base *tkr = &tk->tkr_mono; memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy)); - cycles_at_suspend = tkr->read(tkr->clock); - tkr_dummy.read = dummy_clock_read; + cycles_at_suspend = tk_clock_read(tkr); + tkr_dummy.clock = &dummy_clock; update_fast_timekeeper(&tkr_dummy, &tk_fast_mono); tkr = &tk->tkr_raw; memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy)); - tkr_dummy.read = dummy_clock_read; + tkr_dummy.clock = &dummy_clock; update_fast_timekeeper(&tkr_dummy, &tk_fast_raw); } @@ -649,11 +671,10 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action) */ static void timekeeping_forward_now(struct timekeeper *tk) { - struct clocksource *clock = tk->tkr_mono.clock; u64 cycle_now, delta; u64 nsec; - cycle_now = tk->tkr_mono.read(clock); + cycle_now = tk_clock_read(&tk->tkr_mono); delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask); tk->tkr_mono.cycle_last = cycle_now; tk->tkr_raw.cycle_last = cycle_now; @@ -929,8 +950,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot) do { seq = read_seqcount_begin(&tk_core.seq); - - now = tk->tkr_mono.read(tk->tkr_mono.clock); + now = tk_clock_read(&tk->tkr_mono); systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq; systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; base_real = ktime_add(tk->tkr_mono.base, @@ -1108,7 +1128,7 @@ int get_device_system_crosststamp(int (*get_time_fn) * Check whether the system counter value provided by the * device driver is on the current timekeeping interval. */ - now = tk->tkr_mono.read(tk->tkr_mono.clock); + now = tk_clock_read(&tk->tkr_mono); interval_start = tk->tkr_mono.cycle_last; if (!cycle_between(interval_start, cycles, now)) { clock_was_set_seq = tk->clock_was_set_seq; @@ -1629,7 +1649,7 @@ void timekeeping_resume(void) * The less preferred source will only be tried if there is no better * usable source. The rtc part is handled separately in rtc core code. */ - cycle_now = tk->tkr_mono.read(clock); + cycle_now = tk_clock_read(&tk->tkr_mono); if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) && cycle_now > tk->tkr_mono.cycle_last) { u64 nsec, cyc_delta; @@ -2030,7 +2050,7 @@ void update_wall_time(void) #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET offset = real_tk->cycle_interval; #else - offset = clocksource_delta(tk->tkr_mono.read(tk->tkr_mono.clock), + offset = clocksource_delta(tk_clock_read(&tk->tkr_mono), tk->tkr_mono.cycle_last, tk->tkr_mono.mask); #endif