From patchwork Sat Feb 4 01:45:08 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 6622 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id A58B123E92 for ; Sat, 4 Feb 2012 01:45:53 +0000 (UTC) Received: from mail-iy0-f180.google.com (mail-iy0-f180.google.com [209.85.210.180]) by fiordland.canonical.com (Postfix) with ESMTP id 67D23A180C7 for ; Sat, 4 Feb 2012 01:45:53 +0000 (UTC) Received: by mail-iy0-f180.google.com with SMTP id z7so7596471iab.11 for ; Fri, 03 Feb 2012 17:45:53 -0800 (PST) Received: by 10.50.42.199 with SMTP id q7mr11212411igl.9.1328319953020; Fri, 03 Feb 2012 17:45:53 -0800 (PST) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.169.210 with SMTP id a18cs33807ibz; Fri, 3 Feb 2012 17:45:52 -0800 (PST) Received: by 10.68.74.138 with SMTP id t10mr22240465pbv.126.1328319952323; Fri, 03 Feb 2012 17:45:52 -0800 (PST) Received: from e34.co.us.ibm.com (e34.co.us.ibm.com. [32.97.110.152]) by mx.google.com with ESMTPS id o4si10772492pbq.131.2012.02.03.17.45.51 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 03 Feb 2012 17:45:52 -0800 (PST) Received-SPF: pass (google.com: domain of paulmck@linux.vnet.ibm.com designates 32.97.110.152 as permitted sender) client-ip=32.97.110.152; Authentication-Results: mx.google.com; spf=pass (google.com: domain of paulmck@linux.vnet.ibm.com designates 32.97.110.152 as permitted sender) smtp.mail=paulmck@linux.vnet.ibm.com Received: from /spool/local by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 3 Feb 2012 18:45:51 -0700 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 3 Feb 2012 18:45:48 -0700 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 83B193E4005B; Fri, 3 Feb 2012 18:45:47 -0700 (MST) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q141jkJ5170232; Fri, 3 Feb 2012 18:45:46 -0700 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q141jaZF014908; Fri, 3 Feb 2012 18:45:46 -0700 Received: from paulmck-ThinkPad-W500 ([9.47.24.98]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q141jQGZ013138; Fri, 3 Feb 2012 18:45:28 -0700 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id CAC30E5206; Fri, 3 Feb 2012 17:45:26 -0800 (PST) From: "Paul E. McKenney" To: linux-kernel@vger.kernel.org Cc: mingo@elte.hu, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca, josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, Valdis.Kletnieks@vt.edu, dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com, fweisbec@gmail.com, patches@linaro.org, "Paul E. McKenney" , "Paul E. McKenney" Subject: [PATCH tip/core/rcu 33/47] rcu: Update stall-warning documentation Date: Fri, 3 Feb 2012 17:45:08 -0800 Message-Id: <1328319922-30828-33-git-send-email-paulmck@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.8 In-Reply-To: <1328319922-30828-1-git-send-email-paulmck@linux.vnet.ibm.com> References: <20120204014452.GA29811@linux.vnet.ibm.com> <1328319922-30828-1-git-send-email-paulmck@linux.vnet.ibm.com> X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12020401-1780-0000-0000-000002DC2FDA X-IBM-ISS-SpamDetectors: X-IBM-ISS-DetailInfo: BY=3.00000245; HX=3.00000181; KW=3.00000007; PH=3.00000001; SC=3.00000001; SDB=6.00110708; UDB=6.00027489; UTC=2012-02-04 01:45:49 From: "Paul E. McKenney" Add documentation of CONFIG_RCU_CPU_STALL_VERBOSE, CONFIG_RCU_CPU_STALL_INFO, and RCU_STALL_DELAY_DELTA. Describe multiple stall-warning messages from a single stall, and the timing of the subsequent messages. Add headings. Remove RCU_SECONDS_TILL_STALL_RECHECK because this value is now computed at runtime from RCU_CPU_STALL_TIMEOUT, so that sysfs changes to the timeout value now directly affect the RCU_SECONDS_TILL_STALL_RECHECK value. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney --- Documentation/RCU/stallwarn.txt | 87 +++++++++++++++++++++++++++++++++++--- 1 files changed, 80 insertions(+), 7 deletions(-) diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 083d88c..523364e 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -12,14 +12,38 @@ CONFIG_RCU_CPU_STALL_TIMEOUT This kernel configuration parameter defines the period of time that RCU will wait from the beginning of a grace period until it issues an RCU CPU stall warning. This time period is normally - ten seconds. + sixty seconds. -RCU_SECONDS_TILL_STALL_RECHECK + This configuration parameter may be changed at runtime via the + /sys/module/rcutree/parameters/rcu_cpu_stall_timeout, however + this parameter is checked only at the beginning of a cycle. + So if you are 30 seconds into a 70-second stall, setting this + sysfs parameter to (say) five will shorten the timeout for the + -next- stall, or the following warning for the current stall + (assuming the stall lasts long enough). It will not affect the + timing of the next warning for the current stall. - This macro defines the period of time that RCU will wait after - issuing a stall warning until it issues another stall warning - for the same stall. This time period is normally set to three - times the check interval plus thirty seconds. + Stall-warning messages may be enabled and disabled completely via + /sys/module/rcutree/parameters/rcu_cpu_stall_suppress. + +CONFIG_RCU_CPU_STALL_VERBOSE + + This kernel configuration parameter causes the stall warning to + also dump the stacks of any tasks that are blocking the current + RCU-preempt grace period. + +RCU_CPU_STALL_INFO + + This kernel configuration parameter causes the stall warning to + print out additional per-CPU diagnostic information, including + information on scheduling-clock ticks and RCU's idle-CPU tracking. + +RCU_STALL_DELAY_DELTA + + Although the lockdep facility is extremely useful, it does add + some overhead. Therefore, under CONFIG_PROVE_RCU, the + RCU_STALL_DELAY_DELTA macro allows five extra seconds before + giving an RCU CPU stall warning message. RCU_STALL_RAT_DELAY @@ -64,6 +88,54 @@ INFO: rcu_bh_state detected stalls on CPUs/tasks: { } (detected by 4, 2502 jiffi This is rare, but does happen from time to time in real life. +If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, +more information is printed with the stall-warning message, for example: + + INFO: rcu_preempt detected stall on CPU + 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 + (t=65000 jiffies) + +In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is +printed: + + INFO: rcu_preempt detected stall on CPU + 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer=-1 + (t=65000 jiffies) + +The "(64628 ticks this GP)" indicates that this CPU has taken more +than 64,000 scheduling-clock interrupts during the current stalled +grace period. If the CPU was not yet aware of the current grace +period (for example, if it was offline), then this part of the message +indicates how many grace periods behind the CPU is. + +The "idle=" portion of the message prints the dyntick-idle state. +The hex number before the first "/" is the low-order 12 bits of the +dynticks counter, which will have an even-numbered value if the CPU is +in dyntick-idle mode and an odd-numbered value otherwise. The hex +number between the two "/"s is the value of the nesting, which will +be a small positive number if in the idle loop and a very large positive +number (as shown above) otherwise. + +For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the +CPU is not in the process of trying to force itself into dyntick-idle +state, the "." indicates that the CPU has not given up forcing RCU +into dyntick-idle mode (it would be "H" otherwise), and the "timer=-1" +indicates that the CPU has not recented forced RCU into dyntick-idle +mode (it would otherwise indicate the number of microseconds remaining +in this forced state). + + +Multiple Warnings From One Stall + +If a stall lasts long enough, multiple stall-warning messages will be +printed for it. The second and subsequent messages are printed at +longer intervals, so that the time between (say) the first and second +message will be about three times the interval between the beginning +of the stall and the first message. + + +What Causes RCU CPU Stall Warnings? + So your kernel printed an RCU CPU stall warning. The next question is "What caused it?" The following problems can result in RCU CPU stall warnings: @@ -128,4 +200,5 @@ is occurring, which will usually be in the function nearest the top of that portion of the stack which remains the same from trace to trace. If you can reliably trigger the stall, ftrace can be quite helpful. -RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE. +RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE +and with RCU's event tracing.