diff mbox series

[03/10] locking/qspinlock: Kill cmpxchg loop when claiming lock from head of queue

Message ID 1522947547-24081-4-git-send-email-will.deacon@arm.com
State Superseded
Headers show
Series kernel/locking: qspinlock improvements | expand

Commit Message

Will Deacon April 5, 2018, 4:59 p.m. UTC
When a queued locker reaches the head of the queue, it claims the lock
by setting _Q_LOCKED_VAL in the lockword. If there isn't contention, it
must also clear the tail as part of this operation so that subsequent
lockers can avoid taking the slowpath altogether.

Currently this is expressed as a cmpxchg loop that practically only
runs up to two iterations. This is confusing to the reader and unhelpful
to the compiler. Rewrite the cmpxchg loop without the loop, so that a
failed cmpxchg implies that there is contention and we just need to
write to _Q_LOCKED_VAL without considering the rest of the lockword.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>

---
 kernel/locking/qspinlock.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

-- 
2.1.4

Comments

Peter Zijlstra April 5, 2018, 5:19 p.m. UTC | #1
On Thu, Apr 05, 2018 at 05:59:00PM +0100, Will Deacon wrote:
> +

> +	/* In the PV case we might already have _Q_LOCKED_VAL set */

> +	if ((val & _Q_TAIL_MASK) == tail) {

>  		/*

>  		 * The smp_cond_load_acquire() call above has provided the

> +		 * necessary acquire semantics required for locking.

>  		 */

>  		old = atomic_cmpxchg_relaxed(&lock->val, val, _Q_LOCKED_VAL);

>  		if (old == val)

> +			goto release; /* No contention */

>  	}


--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -464,8 +464,7 @@ void queued_spin_lock_slowpath(struct qs
 		 * The smp_cond_load_acquire() call above has provided the
 		 * necessary acquire semantics required for locking.
 		 */
-		old = atomic_cmpxchg_relaxed(&lock->val, val, _Q_LOCKED_VAL);
-		if (old == val)
+		if (atomic_try_cmpxchg_release(&lock->val, &val, _Q_LOCKED_VAL))
 			goto release; /* No contention */
 	}

Does that also work for you? It would generate slightly better code for
x86 (not that it would matter much on this path).
Will Deacon April 6, 2018, 10:54 a.m. UTC | #2
On Thu, Apr 05, 2018 at 07:19:12PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 05, 2018 at 05:59:00PM +0100, Will Deacon wrote:

> > +

> > +	/* In the PV case we might already have _Q_LOCKED_VAL set */

> > +	if ((val & _Q_TAIL_MASK) == tail) {

> >  		/*

> >  		 * The smp_cond_load_acquire() call above has provided the

> > +		 * necessary acquire semantics required for locking.

> >  		 */

> >  		old = atomic_cmpxchg_relaxed(&lock->val, val, _Q_LOCKED_VAL);

> >  		if (old == val)

> > +			goto release; /* No contention */

> >  	}

> 

> --- a/kernel/locking/qspinlock.c

> +++ b/kernel/locking/qspinlock.c

> @@ -464,8 +464,7 @@ void queued_spin_lock_slowpath(struct qs

>  		 * The smp_cond_load_acquire() call above has provided the

>  		 * necessary acquire semantics required for locking.

>  		 */

> -		old = atomic_cmpxchg_relaxed(&lock->val, val, _Q_LOCKED_VAL);

> -		if (old == val)

> +		if (atomic_try_cmpxchg_release(&lock->val, &val, _Q_LOCKED_VAL))

>  			goto release; /* No contention */

>  	}

> 

> Does that also work for you? It would generate slightly better code for

> x86 (not that it would matter much on this path).


Assuming you meant to use atomic_try_cmpxchg_relaxed, then that works for
me too.

Will
diff mbox series

Patch

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index b75361d23ea5..cdfa7b7328a8 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -457,24 +457,21 @@  void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * and nobody is pending, clear the tail code and grab the lock.
 	 * Otherwise, we only need to grab the lock.
 	 */
-	for (;;) {
-		/* In the PV case we might already have _Q_LOCKED_VAL set */
-		if ((val & _Q_TAIL_MASK) != tail || (val & _Q_PENDING_MASK)) {
-			set_locked(lock);
-			break;
-		}
+
+	/* In the PV case we might already have _Q_LOCKED_VAL set */
+	if ((val & _Q_TAIL_MASK) == tail) {
 		/*
 		 * The smp_cond_load_acquire() call above has provided the
-		 * necessary acquire semantics required for locking. At most
-		 * two iterations of this loop may be ran.
+		 * necessary acquire semantics required for locking.
 		 */
 		old = atomic_cmpxchg_relaxed(&lock->val, val, _Q_LOCKED_VAL);
 		if (old == val)
-			goto release;	/* No contention */
-
-		val = old;
+			goto release; /* No contention */
 	}
 
+	/* Either somebody is queued behind us or _Q_PENDING_VAL is set */
+	set_locked(lock);
+
 	/*
 	 * contended path; wait for next if not observed yet, release.
 	 */