diff mbox

[RFC,tip/core/rcu,09/28] rcu: Document failing tick as cause of RCU CPU stall warning

Message ID 1320265849-5744-9-git-send-email-paulmck@linux.vnet.ibm.com
State Accepted
Commit 2c01531f08f8ba663a20d8472a3032f6df133b6e
Headers show

Commit Message

Paul E. McKenney Nov. 2, 2011, 8:30 p.m. UTC
One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
These turned out to be caused by a bug that stopped scheduling-clock
tick interrupts from being sent to a given CPU for several hundred seconds.
This commit therefore updates the documentation to call this out as a
possible cause for RCU CPU stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/stallwarn.txt |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

Comments

Josh Triplett Nov. 3, 2011, 3:07 a.m. UTC | #1
On Wed, Nov 02, 2011 at 01:30:30PM -0700, Paul E. McKenney wrote:
> One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
> These turned out to be caused by a bug that stopped scheduling-clock
> tick interrupts from being sent to a given CPU for several hundred seconds.

Out of curiosity, what caused this bug?

- Josh Triplett
Paul E. McKenney Nov. 3, 2011, 1:25 p.m. UTC | #2
On Wed, Nov 02, 2011 at 08:07:50PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:30PM -0700, Paul E. McKenney wrote:
> > One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
> > These turned out to be caused by a bug that stopped scheduling-clock
> > tick interrupts from being sent to a given CPU for several hundred seconds.
> 
> Out of curiosity, what caused this bug?

If I remember correctly, software/configuration bugs in the clock code.

							Thanx, Paul
diff mbox

Patch

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 4e95920..f3e0625 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -101,6 +101,11 @@  o	A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
 	CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
 	messages.
 
+o	A hardware or software issue shuts off the scheduler-clock
+	interrupt on a CPU that is not in dyntick-idle mode.  This
+	problem really has happened, and seems to be most likely to
+	result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.
+
 o	A bug in the RCU implementation.
 
 o	A hardware failure.  This is quite unlikely, but has occurred