From patchwork Wed Jul 20 00:18:18 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 2778 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 3C3432405D for ; Wed, 20 Jul 2011 00:19:33 +0000 (UTC) Received: from mail-qw0-f52.google.com (mail-qw0-f52.google.com [209.85.216.52]) by fiordland.canonical.com (Postfix) with ESMTP id E2EC3A1817D for ; Wed, 20 Jul 2011 00:19:32 +0000 (UTC) Received: by qwb8 with SMTP id 8so3319294qwb.11 for ; Tue, 19 Jul 2011 17:19:32 -0700 (PDT) Received: by 10.229.217.3 with SMTP id hk3mr7228322qcb.38.1311121172379; Tue, 19 Jul 2011 17:19:32 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.229.217.78 with SMTP id hl14cs98690qcb; Tue, 19 Jul 2011 17:19:32 -0700 (PDT) Received: by 10.142.217.2 with SMTP id p2mr3685873wfg.111.1311121171413; Tue, 19 Jul 2011 17:19:31 -0700 (PDT) Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.145]) by mx.google.com with ESMTPS id o6si16357566wfg.50.2011.07.19.17.19.30 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 19 Jul 2011 17:19:31 -0700 (PDT) Received-SPF: pass (google.com: domain of paulmck@linux.vnet.ibm.com designates 32.97.182.145 as permitted sender) client-ip=32.97.182.145; Authentication-Results: mx.google.com; spf=pass (google.com: domain of paulmck@linux.vnet.ibm.com designates 32.97.182.145 as permitted sender) smtp.mail=paulmck@linux.vnet.ibm.com Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e5.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p6JNoDLW019424; Tue, 19 Jul 2011 19:50:13 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p6K0IVIg452098; Tue, 19 Jul 2011 20:18:32 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p6K0IRUH015917; Tue, 19 Jul 2011 20:18:30 -0400 Received: from paulmck-ThinkPad-W500 (paulmck-ThinkPad-W500.beaverton.ibm.com [9.47.24.65]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p6K0IQGQ015836; Tue, 19 Jul 2011 20:18:26 -0400 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id A3E2A13F804; Tue, 19 Jul 2011 17:18:25 -0700 (PDT) From: "Paul E. McKenney" To: linux-kernel@vger.kernel.org Cc: mingo@elte.hu, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca, josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, Valdis.Kletnieks@vt.edu, dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com, patches@linaro.org, greearb@candelatech.com, edt@aei.ca, "Paul E. McKenney" , "Paul E. McKenney" Subject: [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special Date: Tue, 19 Jul 2011 17:18:18 -0700 Message-Id: <1311121103-16978-2-git-send-email-paulmck@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.3.2 In-Reply-To: <20110720001738.GA16369@linux.vnet.ibm.com> References: <20110720001738.GA16369@linux.vnet.ibm.com> From: Paul E. McKenney The RCU_BOOST commits for TREE_PREEMPT_RCU introduced an other-task write to a new RCU_READ_UNLOCK_BOOSTED bit in the task_struct structure's ->rcu_read_unlock_special field, but, as noted by Steven Rostedt, without correctly synchronizing all accesses to ->rcu_read_unlock_special. This could result in bits in ->rcu_read_unlock_special being spuriously set and cleared due to conflicting accesses, which in turn could result in deadlocks between the rcu_node structure's ->lock and the scheduler's rq and pi locks. These deadlocks would result from RCU incorrectly believing that the just-ended RCU read-side critical section had been preempted and/or boosted. If that RCU read-side critical section was executed with either rq or pi locks held, RCU's ensuing (incorrect) calls to the scheduler would cause the scheduler to attempt to once again acquire the rq and pi locks, resulting in deadlock. More complex deadlock cycles are also possible, involving multiple rq and pi locks as well as locks from multiple rcu_node structures. This commit fixes synchronization by creating ->rcu_boosted field in task_struct that is accessed and modified only when holding the ->lock in the rcu_node structure on which the task is queued (on that rcu_node structure's ->blkd_tasks list). This results in tasks accessing only their own current->rcu_read_unlock_special fields, making unsynchronized access once again legal, and keeping the rcu_read_unlock() fastpath free of atomic instructions and memory barriers. The reason that the rcu_read_unlock() fastpath does not need to access the new current->rcu_boosted field is that this new field cannot be non-zero unless the RCU_READ_UNLOCK_BLOCKED bit is set in the current->rcu_read_unlock_special field. Therefore, rcu_read_unlock() need only test current->rcu_read_unlock_special: if that is zero, then current->rcu_boosted must also be zero. This bug does not affect TINY_PREEMPT_RCU because this implementation of RCU accesses current->rcu_read_unlock_special with irqs disabled, thus preventing races on the !SMP systems that TINY_PREEMPT_RCU runs on. Maybe-reported-by: Dave Jones Maybe-reported-by: Sergey Senozhatsky Reported-by: Steven Rostedt Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney Reviewed-by: Steven Rostedt --- include/linux/sched.h | 3 +++ kernel/rcutree_plugin.h | 8 ++++++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 496770a..76676a4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1254,6 +1254,9 @@ struct task_struct { #ifdef CONFIG_PREEMPT_RCU int rcu_read_lock_nesting; char rcu_read_unlock_special; +#if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU) + int rcu_boosted; +#endif /* #if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU) */ struct list_head rcu_node_entry; #endif /* #ifdef CONFIG_PREEMPT_RCU */ #ifdef CONFIG_TREE_PREEMPT_RCU diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index 94a674e..82b3c58 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -342,6 +342,11 @@ static void rcu_read_unlock_special(struct task_struct *t) #ifdef CONFIG_RCU_BOOST if (&t->rcu_node_entry == rnp->boost_tasks) rnp->boost_tasks = np; + /* Snapshot and clear ->rcu_boosted with rcu_node lock held. */ + if (t->rcu_boosted) { + special |= RCU_READ_UNLOCK_BOOSTED; + t->rcu_boosted = 0; + } #endif /* #ifdef CONFIG_RCU_BOOST */ t->rcu_blocked_node = NULL; @@ -358,7 +363,6 @@ static void rcu_read_unlock_special(struct task_struct *t) #ifdef CONFIG_RCU_BOOST /* Unboost if we were boosted. */ if (special & RCU_READ_UNLOCK_BOOSTED) { - t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BOOSTED; rt_mutex_unlock(t->rcu_boost_mutex); t->rcu_boost_mutex = NULL; } @@ -1175,7 +1179,7 @@ static int rcu_boost(struct rcu_node *rnp) t = container_of(tb, struct task_struct, rcu_node_entry); rt_mutex_init_proxy_locked(&mtx, t); t->rcu_boost_mutex = &mtx; - t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BOOSTED; + t->rcu_boosted = 1; raw_spin_unlock_irqrestore(&rnp->lock, flags); rt_mutex_lock(&mtx); /* Side effect: boosts task t's priority. */ rt_mutex_unlock(&mtx); /* Keep lockdep happy. */