From patchwork Mon Dec 15 09:32:17 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viresh Kumar X-Patchwork-Id: 42218 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f198.google.com (mail-lb0-f198.google.com [209.85.217.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 725532456A for ; Mon, 15 Dec 2014 09:32:28 +0000 (UTC) Received: by mail-lb0-f198.google.com with SMTP id p9sf6962129lbv.5 for ; Mon, 15 Dec 2014 01:32:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mime-version:in-reply-to:references :date:message-id:subject:from:to:cc:content-type:sender:precedence :list-id:x-original-sender:x-original-authentication-results :mailing-list:list-post:list-help:list-archive:list-unsubscribe; bh=2URuQlaqB1VHIWelhzzMKbTyl6NpOYhVUv9L9X791kw=; b=bzDLKcm95UZLppdmWBH70vzZu14+AaVJbYJ8wXiHzl2f8Ydq7I31dTJ5JSM4TAVFbt oYlFaOfZ06zpgx8ipj8Kpeb+MewdXfVUw68PmynBvQylHUQ8Gp+K2bcmaokI3eddCgAN QNrnwggHu1Zpp9h/cVsP5sw5I4gO7t95C90ufoQKUSSA74aInk3SdbqVYrc9VKOhE2g6 p670JytNpfEGJl/cwrfLVpREwqxSKUr6OVWvx4UdLI6e3kPkA97uCo0J+/hgURtFvCuo 8cFmuIQEE1hcKegJc0ssoa9twJCYB13JOqrQ/l/VsICYlO1Y/VFPM6bn0HY8n1wbwp6v 2MQA== X-Gm-Message-State: ALoCoQnH5ZAcSKDjJk8XgSitFdhTZBaMbI/WXY6UouXDsvmIe5GqBL30Q2FX7UcAZvi28GquIIiO X-Received: by 10.180.24.131 with SMTP id u3mr123454wif.4.1418635947385; Mon, 15 Dec 2014 01:32:27 -0800 (PST) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.27.202 with SMTP id v10ls552738lag.77.gmail; Mon, 15 Dec 2014 01:32:27 -0800 (PST) X-Received: by 10.112.24.130 with SMTP id u2mr29000331lbf.57.1418635947116; Mon, 15 Dec 2014 01:32:27 -0800 (PST) Received: from mail-lb0-f173.google.com (mail-lb0-f173.google.com. [209.85.217.173]) by mx.google.com with ESMTPS id t9si9957962lag.15.2014.12.15.01.32.26 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 15 Dec 2014 01:32:26 -0800 (PST) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.173 as permitted sender) client-ip=209.85.217.173; Received: by mail-lb0-f173.google.com with SMTP id z12so8642874lbi.4 for ; Mon, 15 Dec 2014 01:32:26 -0800 (PST) X-Received: by 10.152.7.229 with SMTP id m5mr29569978laa.80.1418635946730; Mon, 15 Dec 2014 01:32:26 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.142.69 with SMTP id ru5csp658250lbb; Mon, 15 Dec 2014 01:32:25 -0800 (PST) X-Received: by 10.66.159.67 with SMTP id xa3mr27260493pab.13.1418635944640; Mon, 15 Dec 2014 01:32:24 -0800 (PST) Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id qo8si12943781pdb.246.2014.12.15.01.32.23 for ; Mon, 15 Dec 2014 01:32:24 -0800 (PST) Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751934AbaLOJcV (ORCPT + 27 others); Mon, 15 Dec 2014 04:32:21 -0500 Received: from mail-ob0-f178.google.com ([209.85.214.178]:33270 "EHLO mail-ob0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751448AbaLOJcR (ORCPT ); Mon, 15 Dec 2014 04:32:17 -0500 Received: by mail-ob0-f178.google.com with SMTP id gq1so16700893obb.9 for ; Mon, 15 Dec 2014 01:32:17 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.202.208.199 with SMTP id h190mr17412877oig.29.1418635937295; Mon, 15 Dec 2014 01:32:17 -0800 (PST) Received: by 10.182.139.37 with HTTP; Mon, 15 Dec 2014 01:32:17 -0800 (PST) In-Reply-To: <548E8D01.9050707@linux.vnet.ibm.com> References: <20141211194204.GA19083@wfg-t540p.sh.intel.com> <548E8D01.9050707@linux.vnet.ibm.com> Date: Mon, 15 Dec 2014 15:02:17 +0530 Message-ID: Subject: Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection From: Viresh Kumar To: Preeti U Murthy , Thomas Gleixner , Fengguang Wu Cc: Frederic Weisbecker , "Pan, Jacob jun" , LKML , LKP Sender: linux-kernel-owner@vger.kernel.org Precedence: list List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: viresh.kumar@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.173 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , On 15 December 2014 at 12:55, Preeti U Murthy wrote: > Hi Viresh, > > Let me explain why I think this is happening. > > 1. tick_nohz_irq_enter/exit() both get called *only if the cpu is idle* > and receives an interrupt. Bang on target. Yeah that's the part we missed while writing this patch :) > 2. Commit 2a16fc93d2c9568e1, cancels programming of tick_sched timer > in its handler, assuming that tick_nohz_irq_exit() will take care of > programming the clock event device appropriately, and hence it would > requeue or cancel the tick_sched timer. Correct. > 3. But the intel_powerclamp driver injects an idle period only. > *The CPU however is not idle*. It has work on its runqueue and the > rq->curr != idle. This means that *tick_nohz_irq_enter()/exit() will not > get called on any interrupt*. Still good.. > 4. As a consequence, when we get a hrtimer interrupt during the period > that the powerclamp driver is mimicking idle, the exit path of the > interrupt never calls tick_nohz_irq_exit(). Hence the tick_sched timer > that would have got removed due to the above commit will not get > enqueued back on for any pending timers that there might be. Besides > this, *jiffies never gets updated*. Jiffies can be updated by any CPU and there is something called a control cpu with powerclamp driver. BUT we may have got interrupted before the powerclamp timer expired and so we are stuck in the while (time_before(jiffies, target_jiffies)) loop for ever. > Hope the above explanation makes sense. Mostly good. Thanks for helping out. Now, what's the right solution going forward ? - Revert the offending commit .. - Or still try to avoid reprogramming if we can .. This is what I could come up with to still avoid reprogramming of tick: hrtimer_forward(timer, now, tick_period); Above change checks why we have stopped tick.. - The cpu has gone idle (really): is_idle_task(current) - The cpu isn't in idle mode, i.e. its in nohz-full mode: !ts->inidle This fixed the issues with powerclamp in my case. @Fengguang: Can you please check if this fixes it for you as well? @Thomas: Please let me know if you want me to send this fix or you want to revert the original commit itself. Thanks. --- Viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index cc0a5b6f741b..49f4278f69e2 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1100,7 +1100,7 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer) tick_sched_handle(ts, regs); /* No need to reprogram if we are in idle or full dynticks mode */ - if (unlikely(ts->tick_stopped)) + if (unlikely(ts->tick_stopped && (is_idle_task(current) || !ts->inidle))) return HRTIMER_NORESTART;