Message ID | 1328125319-5205-17-git-send-email-paulmck@linux.vnet.ibm.com |
---|---|
State | Superseded |
Headers | show |
On Wed, Feb 01, 2012 at 11:41:35AM -0800, Paul E. McKenney wrote: > The grace-period initialization sequence in rcu_start_gp() has a special > case for systems where the rcu_node tree is a single rcu_node structure. > This made sense some years ago when systems were smaller and up to 64 > CPUs could share a single rcu_node structure, but now that large systems > are common and a given leaf rcu_node structure can support only 16 CPUs > (due to lock contention on the rcu_node's ->lock field), this optimization > is almost never taken. And even the small mobile platforms that might > make use of it might rather have the kernel text reduction. > > Therefore, this commit removes the check for single-rcu_node trees. This optimization would continue to work on laptops for a while longer. :) That said, I do agree that reducing code size and complexity seems preferable. If someone wants an optimization like this, they'd probably do better to compile RCU with a low compile-time limit on the number of CPUs, which would at least theoretically allow the compiler to get similar results through optimization. (I don't know if that works in practice with the current code structure and the current intelligence of GCC.) Reviewed-by: Josh Triplett <josh@joshtriplett.org>
On Wed, Feb 01, 2012 at 06:13:14PM -0800, Josh Triplett wrote: > On Wed, Feb 01, 2012 at 11:41:35AM -0800, Paul E. McKenney wrote: > > The grace-period initialization sequence in rcu_start_gp() has a special > > case for systems where the rcu_node tree is a single rcu_node structure. > > This made sense some years ago when systems were smaller and up to 64 > > CPUs could share a single rcu_node structure, but now that large systems > > are common and a given leaf rcu_node structure can support only 16 CPUs > > (due to lock contention on the rcu_node's ->lock field), this optimization > > is almost never taken. And even the small mobile platforms that might > > make use of it might rather have the kernel text reduction. > > > > Therefore, this commit removes the check for single-rcu_node trees. > > This optimization would continue to work on laptops for a while longer. > :) How many more months? ;-) > That said, I do agree that reducing code size and complexity seems > preferable. If someone wants an optimization like this, they'd probably > do better to compile RCU with a low compile-time limit on the number of > CPUs, which would at least theoretically allow the compiler to get > similar results through optimization. (I don't know if that works in > practice with the current code structure and the current intelligence of > GCC.) > > Reviewed-by: Josh Triplett <josh@joshtriplett.org> Thank you for all your reviews -- as always, very helpful!!! Thanx, Paul
diff --git a/kernel/rcutree.c b/kernel/rcutree.c index ee2009d..38d143b 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -984,26 +984,8 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags) rsp->fqs_state = RCU_GP_INIT; /* Hold off force_quiescent_state. */ rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS; record_gp_stall_check_time(rsp); - - /* Special-case the common single-level case. */ - if (NUM_RCU_NODES == 1) { - rcu_preempt_check_blocked_tasks(rnp); - rnp->qsmask = rnp->qsmaskinit; - rnp->gpnum = rsp->gpnum; - rnp->completed = rsp->completed; - rsp->fqs_state = RCU_SIGNAL_INIT; /* force_quiescent_state OK */ - rcu_start_gp_per_cpu(rsp, rnp, rdp); - rcu_preempt_boost_start_gp(rnp); - trace_rcu_grace_period_init(rsp->name, rnp->gpnum, - rnp->level, rnp->grplo, - rnp->grphi, rnp->qsmask); - raw_spin_unlock_irqrestore(&rnp->lock, flags); - return; - } - raw_spin_unlock(&rnp->lock); /* leave irqs disabled. */ - /* Exclude any concurrent CPU-hotplug operations. */ raw_spin_lock(&rsp->onofflock); /* irqs already disabled. */