From patchwork Mon Aug 17 20:41:02 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 52484 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f200.google.com (mail-lb0-f200.google.com [209.85.217.200]) by patches.linaro.org (Postfix) with ESMTPS id 850892156D for ; Mon, 17 Aug 2015 20:41:37 +0000 (UTC) Received: by lbcli1 with SMTP id li1sf49293219lbc.2 for ; Mon, 17 Aug 2015 13:41:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=WL3yCgDNj3GTxgjeXPiPywtc7gEyU2WC2vpBqQbPuac=; b=VpazoUGMbx9JeMuM5vJJ7pyd5glqJuhajI6kgmRnyyTBbF/cCmK29KTaA/N3nXRuRu cRZqsH76DLdWVY7oxxx1FqqmL4eMWaq7bN/jGbziim9aoVDeNtPb0JbcVlGYvZLGH+7a X+5gCkc9Al3RzT6MD/076R9Wj/vQdSsEyl31ROJcuB2AkAq2+0DlCkQyrlX6iS0moIT0 pXEJfRHoRnoyWexs3kg5YWaC1S/nSsdFy7CMAk8V8xvVsCyH06wAyU6bfyeDbjddREpm 2sb5onb+01V0b/DomnpUYCdIgx4qWR2nfZBmFV0u4d62NT+GTKZzMuZXHAKB6eXLbbiw kVSQ== X-Gm-Message-State: ALoCoQkZYfxEkIlHv6jePTrL5oHgrl3KaD7nL26SYiuWFLn2ozEw4IT8zGvwJ/qTl2cE7rQ3GxDh X-Received: by 10.112.90.225 with SMTP id bz1mr792142lbb.12.1439844096491; Mon, 17 Aug 2015 13:41:36 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.204.9 with SMTP id ku9ls638320lac.37.gmail; Mon, 17 Aug 2015 13:41:36 -0700 (PDT) X-Received: by 10.152.225.165 with SMTP id rl5mr2633333lac.1.1439844096217; Mon, 17 Aug 2015 13:41:36 -0700 (PDT) Received: from mail-la0-f53.google.com (mail-la0-f53.google.com. [209.85.215.53]) by mx.google.com with ESMTPS id x3si12375895lal.47.2015.08.17.13.41.36 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Aug 2015 13:41:36 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.53 as permitted sender) client-ip=209.85.215.53; Received: by lagz9 with SMTP id z9so86830790lag.3 for ; Mon, 17 Aug 2015 13:41:36 -0700 (PDT) X-Received: by 10.112.77.10 with SMTP id o10mr2601786lbw.73.1439844096114; Mon, 17 Aug 2015 13:41:36 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.112.162.200 with SMTP id yc8csp86018lbb; Mon, 17 Aug 2015 13:41:34 -0700 (PDT) X-Received: by 10.66.230.201 with SMTP id ta9mr6040949pac.95.1439844079858; Mon, 17 Aug 2015 13:41:19 -0700 (PDT) Received: from mail-pa0-f48.google.com (mail-pa0-f48.google.com. [209.85.220.48]) by mx.google.com with ESMTPS id zc4si26237856pbc.188.2015.08.17.13.41.19 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Aug 2015 13:41:19 -0700 (PDT) Received-SPF: pass (google.com: domain of john.stultz@linaro.org designates 209.85.220.48 as permitted sender) client-ip=209.85.220.48; Received: by paccq16 with SMTP id cq16so72048411pac.1 for ; Mon, 17 Aug 2015 13:41:19 -0700 (PDT) X-Received: by 10.66.164.195 with SMTP id ys3mr6057994pab.87.1439844078963; Mon, 17 Aug 2015 13:41:18 -0700 (PDT) Received: from localhost.localdomain (c-76-115-103-22.hsd1.or.comcast.net. [76.115.103.22]) by smtp.gmail.com with ESMTPSA id gu2sm85199pbc.1.2015.08.17.13.41.17 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 17 Aug 2015 13:41:18 -0700 (PDT) From: John Stultz To: lkml Cc: Shaohua Li , Prarit Bhargava , Richard Cochran , Daniel Lezcano , Thomas Gleixner , Ingo Molnar , John Stultz Subject: [PATCH 8/9] clocksource: Improve unstable clocksource detection Date: Mon, 17 Aug 2015 13:41:02 -0700 Message-Id: <1439844063-7957-9-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1439844063-7957-1-git-send-email-john.stultz@linaro.org> References: <1439844063-7957-1-git-send-email-john.stultz@linaro.org> X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: john.stultz@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.53 as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , From: Shaohua Li >From time to time we saw TSC is marked as unstable in our systems, while the CPUs declare to have stable TSC. Looking at the clocksource unstable detection, there are two problems: - watchdog clock source wrap. HPET is the most common watchdog clock source. It's 32-bit and runs in 14.3Mhz. That means the hpet counter can wrap in about 5 minutes. - threshold isn't scaled against interval. The threshold is 0.0625s in 0.5s interval. What if the actual interval is bigger than 0.5s? The watchdog runs in a timer bh, so hard/soft irq can defer its running. Heavy network stack softirq can hog a cpu. IPMI driver can disable interrupt for a very long time. The first problem is mostly we are suffering I think. Here is a simple patch to fix the issues. If the waterdog doesn't run for a long time, we ignore the detection. This should work for the two problems. For the second one, we probably doen't need to scale if the interval isn't very long. Cc: Prarit Bhargava Cc: Richard Cochran Cc: Daniel Lezcano Cc: Thomas Gleixner Cc: Ingo Molnar Signed-off-by: Shaohua Li Signed-off-by: John Stultz --- kernel/time/clocksource.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 841b72f..8417c83 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -122,9 +122,10 @@ static int clocksource_watchdog_kthread(void *data); static void __clocksource_change_rating(struct clocksource *cs, int rating); /* - * Interval: 0.5sec Threshold: 0.0625s + * Interval: 0.5sec MaxInterval: 1s Threshold: 0.0625s */ #define WATCHDOG_INTERVAL (HZ >> 1) +#define WATCHDOG_MAX_INTERVAL_NS (NSEC_PER_SEC) #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4) static void clocksource_watchdog_work(struct work_struct *work) @@ -217,7 +218,9 @@ static void clocksource_watchdog(unsigned long data) continue; /* Check the deviation from the watchdog clocksource. */ - if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) { + if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) && + cs_nsec < WATCHDOG_MAX_INTERVAL_NS && + wd_nsec < WATCHDOG_MAX_INTERVAL_NS) { pr_warn("timekeeping watchdog: Marking clocksource '%s' as unstable because the skew is too large:\n", cs->name); pr_warn(" '%s' wd_now: %llx wd_last: %llx mask: %llx\n",