mbox series

[RFC,0/3] arm64: queued spinlocks and rw-locks

Message ID 1491860104-4103-1-git-send-email-ynorov@caviumnetworks.com
Headers show
Series arm64: queued spinlocks and rw-locks | expand

Message

Yury Norov April 10, 2017, 9:35 p.m. UTC
The patch of Jan Glauber enables queued spinlocks on arm64. I rebased it on
latest kernel sources, and added a couple of fixes to headers to apply it 
smoothly.

Though, locktourture test shows significant performance degradation in the
acquisition of rw-lock for read on qemu:

                          Before           After
spin_lock-torture:      38957034        37076367         -4.83
rw_lock-torture W:       5369471        18971957        253.33
rw_lock-torture R:       6413179         3668160        -42.80

I'm  not much experienced in locking, and so wonder how it's possible that
simple switching to generic queued rw-lock causes so significant performance
degradation, while in theory it should improve it. Even more, on x86 there
are no such problems probably.

I also think that patches 1 and 2 are correct and useful, and should be applied
anyway.

Any comments appreciated.

Yury.

Jan Glauber (1):
  arm64/locking: qspinlocks and qrwlocks support

Yury Norov (2):
  kernel/locking: #include <asm/spinlock.h> in qrwlock.c
  asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h

 arch/arm64/Kconfig                      |  2 ++
 arch/arm64/include/asm/qrwlock.h        |  7 +++++++
 arch/arm64/include/asm/qspinlock.h      | 20 ++++++++++++++++++++
 arch/arm64/include/asm/spinlock.h       | 12 ++++++++++++
 arch/arm64/include/asm/spinlock_types.h | 14 +++++++++++---
 include/asm-generic/qspinlock.h         |  1 +
 include/asm-generic/qspinlock_types.h   |  8 --------
 kernel/locking/qrwlock.c                |  1 +
 8 files changed, 54 insertions(+), 11 deletions(-)
 create mode 100644 arch/arm64/include/asm/qrwlock.h
 create mode 100644 arch/arm64/include/asm/qspinlock.h

-- 
2.7.4

Comments

Adam Wallis April 12, 2017, 5:04 p.m. UTC | #1
On 4/10/2017 5:35 PM, Yury Norov wrote:
> The patch of Jan Glauber enables queued spinlocks on arm64. I rebased it on

> latest kernel sources, and added a couple of fixes to headers to apply it 

> smoothly.

> 

> Though, locktourture test shows significant performance degradation in the

> acquisition of rw-lock for read on qemu:

> 

>                           Before           After

> spin_lock-torture:      38957034        37076367         -4.83

> rw_lock-torture W:       5369471        18971957        253.33

> rw_lock-torture R:       6413179         3668160        -42.80

> 


On our 48 core QDF2400 part, I am seeing huge improvements with these patches on
the torture tests. The improvements go up even further when I apply Jason Low's
MCS Spinlock patch: https://lkml.org/lkml/2016/4/20/725

> I'm  not much experienced in locking, and so wonder how it's possible that

> simple switching to generic queued rw-lock causes so significant performance

> degradation, while in theory it should improve it. Even more, on x86 there

> are no such problems probably.

> 

> I also think that patches 1 and 2 are correct and useful, and should be applied

> anyway.

> 

> Any comments appreciated.

> 

> Yury.

> 


I will be happy to tests these patches more thoroughly after you get some
additional comments/feedback.

> Jan Glauber (1):

>   arm64/locking: qspinlocks and qrwlocks support

> 

> Yury Norov (2):

>   kernel/locking: #include <asm/spinlock.h> in qrwlock.c

>   asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h

> 

>  arch/arm64/Kconfig                      |  2 ++

>  arch/arm64/include/asm/qrwlock.h        |  7 +++++++

>  arch/arm64/include/asm/qspinlock.h      | 20 ++++++++++++++++++++

>  arch/arm64/include/asm/spinlock.h       | 12 ++++++++++++

>  arch/arm64/include/asm/spinlock_types.h | 14 +++++++++++---

>  include/asm-generic/qspinlock.h         |  1 +

>  include/asm-generic/qspinlock_types.h   |  8 --------

>  kernel/locking/qrwlock.c                |  1 +

>  8 files changed, 54 insertions(+), 11 deletions(-)

>  create mode 100644 arch/arm64/include/asm/qrwlock.h

>  create mode 100644 arch/arm64/include/asm/qspinlock.h

> 


Thanks

-- 
Adam Wallis
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
Yury Norov April 13, 2017, 10:33 a.m. UTC | #2
On Wed, Apr 12, 2017 at 01:04:55PM -0400, Adam Wallis wrote:
> On 4/10/2017 5:35 PM, Yury Norov wrote:

> > The patch of Jan Glauber enables queued spinlocks on arm64. I rebased it on

> > latest kernel sources, and added a couple of fixes to headers to apply it 

> > smoothly.

> > 

> > Though, locktourture test shows significant performance degradation in the

> > acquisition of rw-lock for read on qemu:

> > 

> >                           Before           After

> > spin_lock-torture:      38957034        37076367         -4.83

> > rw_lock-torture W:       5369471        18971957        253.33

> > rw_lock-torture R:       6413179         3668160        -42.80

> > 

> 

> On our 48 core QDF2400 part, I am seeing huge improvements with these patches on

> the torture tests. The improvements go up even further when I apply Jason Low's

> MCS Spinlock patch: https://lkml.org/lkml/2016/4/20/725


It sounds great. So performance issue is looking like my local
problem, most probably because I ran tests on Qemu VM.

I don't see any problems with this series, other than performance,
and if it looks fine now, I think it's good enough for upstream.

Yury.
Will Deacon April 24, 2017, 1:36 p.m. UTC | #3
On Wed, Apr 12, 2017 at 01:04:55PM -0400, Adam Wallis wrote:
> On 4/10/2017 5:35 PM, Yury Norov wrote:

> > The patch of Jan Glauber enables queued spinlocks on arm64. I rebased it on

> > latest kernel sources, and added a couple of fixes to headers to apply it 

> > smoothly.

> > 

> > Though, locktourture test shows significant performance degradation in the

> > acquisition of rw-lock for read on qemu:

> > 

> >                           Before           After

> > spin_lock-torture:      38957034        37076367         -4.83

> > rw_lock-torture W:       5369471        18971957        253.33

> > rw_lock-torture R:       6413179         3668160        -42.80

> > 

> 

> On our 48 core QDF2400 part, I am seeing huge improvements with these patches on

> the torture tests. The improvements go up even further when I apply Jason Low's

> MCS Spinlock patch: https://lkml.org/lkml/2016/4/20/725


Does the QDF2400 implement the large system extensions? If so, how do the
queued lock implementations compare to the LSE-based ticket locks?

Will
Will Deacon April 28, 2017, 3:37 p.m. UTC | #4
On Thu, Apr 13, 2017 at 01:33:09PM +0300, Yury Norov wrote:
> On Wed, Apr 12, 2017 at 01:04:55PM -0400, Adam Wallis wrote:

> > On 4/10/2017 5:35 PM, Yury Norov wrote:

> > > The patch of Jan Glauber enables queued spinlocks on arm64. I rebased it on

> > > latest kernel sources, and added a couple of fixes to headers to apply it 

> > > smoothly.

> > > 

> > > Though, locktourture test shows significant performance degradation in the

> > > acquisition of rw-lock for read on qemu:

> > > 

> > >                           Before           After

> > > spin_lock-torture:      38957034        37076367         -4.83

> > > rw_lock-torture W:       5369471        18971957        253.33

> > > rw_lock-torture R:       6413179         3668160        -42.80

> > > 

> > 

> > On our 48 core QDF2400 part, I am seeing huge improvements with these patches on

> > the torture tests. The improvements go up even further when I apply Jason Low's

> > MCS Spinlock patch: https://lkml.org/lkml/2016/4/20/725

> 

> It sounds great. So performance issue is looking like my local

> problem, most probably because I ran tests on Qemu VM.

> 

> I don't see any problems with this series, other than performance,

> and if it looks fine now, I think it's good enough for upstream.


I would still like to understand why you see such a significant performance
degradation, and whether or not you also see that on native hardware (i.e.
without Qemu involved).

Will