mbox series

[PATCHv3,0/2] arm64: fix hotplug rwsem boot fallout

Message ID 1493377266-2205-1-git-send-email-mark.rutland@arm.com
Headers show
Series arm64: fix hotplug rwsem boot fallout | expand

Message

Mark Rutland April 28, 2017, 11:01 a.m. UTC
Hi,

These patches fix a boot issue seen on some arm64 platforms as a result of the
hotplug rwsem rework.

Thomas, would you be able to take these into the tip smp/hotplug branch?

Will has acked the arm64 part, and is happy for this to go via tip [1].

I've tested this atop of the tip smp/hotplug branch, and with the arm64
for-next/core branch merged in, which git handles automatically. In both cases,
it builds cleanly and boots fine on Juno R1.

Thanks,
Mark.

Since v1 [2]:
* Fix update_cpu_capabilities() by splitting boot/secondary

Since v2 [3]:
* Align on _cpuslocked naming
* Avoid duplicating workaround string
* Add Will's Acked-by

[1] https://lkml.kernel.org/r/20170427180104.GO1890@arm.
[2] https://lkml.kernel.org/r/20170427154806.GA6646@leverpostej
[3] https://lkml.kernel.org/r/1493315077-19496-3-git-send-email-mark.rutland@arm.com

Mark Rutland (1):
  arm64: cpufeature: use static_branch_enable_cpuslocked()

Sebastian Andrzej Siewior (1):
  jump_label: Provide static_key_[enable|/slow_inc]_cpuslocked()

 arch/arm64/include/asm/cpufeature.h |  3 ++-
 arch/arm64/kernel/cpu_errata.c      |  9 ++++++++-
 arch/arm64/kernel/cpufeature.c      |  5 ++++-
 include/linux/jump_label.h          |  7 +++++++
 kernel/jump_label.c                 | 10 ++++++++++
 5 files changed, 31 insertions(+), 3 deletions(-)

-- 
1.9.1

Comments

Mark Rutland May 10, 2017, 6:10 p.m. UTC | #1
On Fri, Apr 28, 2017 at 12:01:04PM +0100, Mark Rutland wrote:
> Hi,

> 

> These patches fix a boot issue seen on some arm64 platforms as a result of the

> hotplug rwsem rework.

> 

> Thomas, would you be able to take these into the tip smp/hotplug branch?

> 

> Will has acked the arm64 part, and is happy for this to go via tip [1].

> 

> I've tested this atop of the tip smp/hotplug branch, and with the arm64

> for-next/core branch merged in, which git handles automatically. In both cases,

> it builds cleanly and boots fine on Juno R1.


As a heads-up, with next-20170510 I'm seeing a warning here that I don't
recall seeing when I tested the above, and I'm not sure how to fix it:

[    0.180998] CPU features: enabling workaround for ARM erratum 832075
[    0.181032] ------------[ cut here ]------------
[    0.181074] WARNING: CPU: 1 PID: 0 at kernel/cpu.c:234 lockdep_assert_hotplug_held+0x78/0x98
[    0.181084] Modules linked in:
[    0.181118] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-next-20170510 #1
[    0.181132] Hardware name: ARM Juno development board (r1) (DT)
[    0.181149] task: ffff80093590af00 task.stack: ffff800935938000
[    0.181172] PC is at lockdep_assert_hotplug_held+0x78/0x98
[    0.181194] LR is at lockdep_assert_hotplug_held+0x78/0x98
[    0.181213] pc : [<ffff200008155218>] lr : [<ffff200008155218>] pstate: 800001c5
[    0.181225] sp : ffff80093593bd80
[    0.181239] x29: ffff80093593bd80 x28: 0000000000000006 
[    0.181274] x27: 0000000000000000 x26: 0000000000000008 
[    0.181309] x25: 1fffe40001aa0d96 x24: 0000000000000000 
[    0.181344] x23: ffff20000abf74a0 x22: 0000000000000000 
[    0.181379] x21: 1ffff00126b277be x20: ffff20000b054000 
[    0.181414] x19: ffff20000ba69000 x18: 0000000000000000 
[    0.181448] x17: 0000000000000000 x16: 0000000000000002 
[    0.181482] x15: 0000000000000000 x14: ffff20000d71af80 
[    0.181517] x13: 1ffff00126b216dd x12: ffff80093590b6c0 
[    0.181552] x11: 1ffff00126b216d8 x10: ffff80093590b6c8 
[    0.181587] x9 : ffff20000ae3b000 x8 : ffff20000d71a000 
[    0.181622] x7 : 1ffff00126b216dc x6 : 0000000041b58ab3 
[    0.181656] x5 : 0000000041b58ab3 x4 : 0000000000000000 
[    0.181690] x3 : 1ffff00126b216d8 x2 : 0000000000000000 
[    0.181724] x1 : 0000000000000001 x0 : 0000000000000000 
[    0.181801] ---[ end trace eee4d9a3d314f895 ]---
[    0.181812] Call trace:
[    0.182133] [<ffff200008155218>] lockdep_assert_hotplug_held+0x78/0x98
[    0.182161] [<ffff20000840a36c>] __static_key_slow_inc+0x174/0x2e0
[    0.182188] [<ffff20000840a654>] static_key_enable_cpuslocked+0x64/0xb0
[    0.182215] [<ffff2000080a1120>] update_cpu_capabilities+0x178/0x2d8
[    0.182243] [<ffff20000809e72c>] update_cpu_errata_workarounds_cpuslocked+0x1c/0x28
[    0.182270] [<ffff2000080a1420>] check_local_cpu_capabilities+0x1a0/0x248
[    0.182295] [<ffff2000080a2d18>] secondary_start_kernel+0x1e8/0x478
[    0.182317] [<000000008219a1b4>] 0x8219a1b4
[    0.182337] CPU features: enabling workaround for ARM erratum 834220
[    0.182362] ------------[ cut here ]------------

The problem is that the secondary CPU doesn't hold the rwsem when it
calls __static_key_slow_inc() in its boot path. It cannot take the
rwsem, since the primaary CPU holds this for the duration of onlining
the secondary CPU.

So far, I haven't figured out how we can avoid this. Any ideas?

Thanks,
Mark.
Thomas Gleixner May 10, 2017, 8:09 p.m. UTC | #2
On Wed, 10 May 2017, Mark Rutland wrote:
> As a heads-up, with next-20170510 I'm seeing a warning here that I don't

> recall seeing when I tested the above, and I'm not sure how to fix it:

> 

> [    0.180998] CPU features: enabling workaround for ARM erratum 832075

> [    0.181032] ------------[ cut here ]------------

> [    0.181074] WARNING: CPU: 1 PID: 0 at kernel/cpu.c:234 lockdep_assert_hotplug_held+0x78/0x98

> [    0.181084] Modules linked in:

> [    0.181118] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-next-20170510 #1

> [    0.181132] Hardware name: ARM Juno development board (r1) (DT)

> [    0.181149] task: ffff80093590af00 task.stack: ffff800935938000

> [    0.181172] PC is at lockdep_assert_hotplug_held+0x78/0x98

> [    0.181194] LR is at lockdep_assert_hotplug_held+0x78/0x98

> [    0.181213] pc : [<ffff200008155218>] lr : [<ffff200008155218>] pstate: 800001c5

> [    0.181225] sp : ffff80093593bd80

> [    0.181239] x29: ffff80093593bd80 x28: 0000000000000006 

> [    0.181274] x27: 0000000000000000 x26: 0000000000000008 

> [    0.181309] x25: 1fffe40001aa0d96 x24: 0000000000000000 

> [    0.181344] x23: ffff20000abf74a0 x22: 0000000000000000 

> [    0.181379] x21: 1ffff00126b277be x20: ffff20000b054000 

> [    0.181414] x19: ffff20000ba69000 x18: 0000000000000000 

> [    0.181448] x17: 0000000000000000 x16: 0000000000000002 

> [    0.181482] x15: 0000000000000000 x14: ffff20000d71af80 

> [    0.181517] x13: 1ffff00126b216dd x12: ffff80093590b6c0 

> [    0.181552] x11: 1ffff00126b216d8 x10: ffff80093590b6c8 

> [    0.181587] x9 : ffff20000ae3b000 x8 : ffff20000d71a000 

> [    0.181622] x7 : 1ffff00126b216dc x6 : 0000000041b58ab3 

> [    0.181656] x5 : 0000000041b58ab3 x4 : 0000000000000000 

> [    0.181690] x3 : 1ffff00126b216d8 x2 : 0000000000000000 

> [    0.181724] x1 : 0000000000000001 x0 : 0000000000000000 

> [    0.181801] ---[ end trace eee4d9a3d314f895 ]---

> [    0.181812] Call trace:

> [    0.182133] [<ffff200008155218>] lockdep_assert_hotplug_held+0x78/0x98

> [    0.182161] [<ffff20000840a36c>] __static_key_slow_inc+0x174/0x2e0

> [    0.182188] [<ffff20000840a654>] static_key_enable_cpuslocked+0x64/0xb0

> [    0.182215] [<ffff2000080a1120>] update_cpu_capabilities+0x178/0x2d8

> [    0.182243] [<ffff20000809e72c>] update_cpu_errata_workarounds_cpuslocked+0x1c/0x28

> [    0.182270] [<ffff2000080a1420>] check_local_cpu_capabilities+0x1a0/0x248

> [    0.182295] [<ffff2000080a2d18>] secondary_start_kernel+0x1e8/0x478

> [    0.182317] [<000000008219a1b4>] 0x8219a1b4

> [    0.182337] CPU features: enabling workaround for ARM erratum 834220

> [    0.182362] ------------[ cut here ]------------

> 

> The problem is that the secondary CPU doesn't hold the rwsem when it

> calls __static_key_slow_inc() in its boot path. It cannot take the

> rwsem, since the primaary CPU holds this for the duration of onlining

> the secondary CPU.

> 

> So far, I haven't figured out how we can avoid this. Any ideas?


Not immediately, but I put it on the list of crap to look at....

Thanks,

	tglx
Thomas Gleixner May 11, 2017, 8:30 a.m. UTC | #3
On Wed, 10 May 2017, Thomas Gleixner wrote:
> On Wed, 10 May 2017, Mark Rutland wrote:

> > [    0.182133] [<ffff200008155218>] lockdep_assert_hotplug_held+0x78/0x98

> > [    0.182161] [<ffff20000840a36c>] __static_key_slow_inc+0x174/0x2e0

> > [    0.182188] [<ffff20000840a654>] static_key_enable_cpuslocked+0x64/0xb0

> > [    0.182215] [<ffff2000080a1120>] update_cpu_capabilities+0x178/0x2d8

> > [    0.182243] [<ffff20000809e72c>] update_cpu_errata_workarounds_cpuslocked+0x1c/0x28

> > [    0.182270] [<ffff2000080a1420>] check_local_cpu_capabilities+0x1a0/0x248

> > [    0.182295] [<ffff2000080a2d18>] secondary_start_kernel+0x1e8/0x478

> > [    0.182317] [<000000008219a1b4>] 0x8219a1b4

> > [    0.182337] CPU features: enabling workaround for ARM erratum 834220

> > [    0.182362] ------------[ cut here ]------------

> > 

> > The problem is that the secondary CPU doesn't hold the rwsem when it

> > calls __static_key_slow_inc() in its boot path. It cannot take the

> > rwsem, since the primaary CPU holds this for the duration of onlining

> > the secondary CPU.


Looking deeper into that:

secondary_start_kernel()
  check_local_cpu_capabilities()
    update_cpu_errata_workarounds()
      update_cpu_capabilities()
        static_key_enable()
	  __static_key_slow_inc()
	    jump_label_lock()
       	      mutex_lock(&jump_label_mutex);

How is that supposed to work?

That call path is the low level CPU bringup, running in the context of the
idle task of that CPU with interrupts and preemption disabled. Taking a
mutex in that context, even if in that case the mutex is uncontended, is a
NONO.

Thanks,

	tglx
Mark Rutland May 11, 2017, 9:37 a.m. UTC | #4
On Thu, May 11, 2017 at 10:30:39AM +0200, Thomas Gleixner wrote:
> On Wed, 10 May 2017, Thomas Gleixner wrote:

> > On Wed, 10 May 2017, Mark Rutland wrote:

> > > [    0.182133] [<ffff200008155218>] lockdep_assert_hotplug_held+0x78/0x98

> > > [    0.182161] [<ffff20000840a36c>] __static_key_slow_inc+0x174/0x2e0

> > > [    0.182188] [<ffff20000840a654>] static_key_enable_cpuslocked+0x64/0xb0

> > > [    0.182215] [<ffff2000080a1120>] update_cpu_capabilities+0x178/0x2d8

> > > [    0.182243] [<ffff20000809e72c>] update_cpu_errata_workarounds_cpuslocked+0x1c/0x28

> > > [    0.182270] [<ffff2000080a1420>] check_local_cpu_capabilities+0x1a0/0x248

> > > [    0.182295] [<ffff2000080a2d18>] secondary_start_kernel+0x1e8/0x478

> > > [    0.182317] [<000000008219a1b4>] 0x8219a1b4

> > > [    0.182337] CPU features: enabling workaround for ARM erratum 834220

> > > [    0.182362] ------------[ cut here ]------------

> > > 

> > > The problem is that the secondary CPU doesn't hold the rwsem when it

> > > calls __static_key_slow_inc() in its boot path. It cannot take the

> > > rwsem, since the primaary CPU holds this for the duration of onlining

> > > the secondary CPU.

> 

> Looking deeper into that:

> 

> secondary_start_kernel()

>   check_local_cpu_capabilities()

>     update_cpu_errata_workarounds()

>       update_cpu_capabilities()

>         static_key_enable()

> 	  __static_key_slow_inc()

> 	    jump_label_lock()

>        	      mutex_lock(&jump_label_mutex);

> 

> How is that supposed to work?

> 

> That call path is the low level CPU bringup, running in the context of the

> idle task of that CPU with interrupts and preemption disabled. Taking a

> mutex in that context, even if in that case the mutex is uncontended, is a

> NONO.


Urgh; good point. Thanks for taking a look.

I think I can solve both issues by deferring poking the keys, so I'll
give that a go.

As an aside, do we have anything that should detect the broken mutex
usage? I've been testing kernels with LOCKDEP, PROVE_LOCKING,
DEBUG_ATOMIC_SLEEP, and friends, and nothing has complained so far.

Thanks,
Mark.
Thomas Gleixner May 11, 2017, 10:01 a.m. UTC | #5
On Thu, 11 May 2017, Mark Rutland wrote:
> On Thu, May 11, 2017 at 10:30:39AM +0200, Thomas Gleixner wrote:

> > On Wed, 10 May 2017, Thomas Gleixner wrote:

> > > On Wed, 10 May 2017, Mark Rutland wrote:

> > > > [    0.182133] [<ffff200008155218>] lockdep_assert_hotplug_held+0x78/0x98

> > > > [    0.182161] [<ffff20000840a36c>] __static_key_slow_inc+0x174/0x2e0

> > > > [    0.182188] [<ffff20000840a654>] static_key_enable_cpuslocked+0x64/0xb0

> > > > [    0.182215] [<ffff2000080a1120>] update_cpu_capabilities+0x178/0x2d8

> > > > [    0.182243] [<ffff20000809e72c>] update_cpu_errata_workarounds_cpuslocked+0x1c/0x28

> > > > [    0.182270] [<ffff2000080a1420>] check_local_cpu_capabilities+0x1a0/0x248

> > > > [    0.182295] [<ffff2000080a2d18>] secondary_start_kernel+0x1e8/0x478

> > > > [    0.182317] [<000000008219a1b4>] 0x8219a1b4

> > > > [    0.182337] CPU features: enabling workaround for ARM erratum 834220

> > > > [    0.182362] ------------[ cut here ]------------

> > > > 

> > > > The problem is that the secondary CPU doesn't hold the rwsem when it

> > > > calls __static_key_slow_inc() in its boot path. It cannot take the

> > > > rwsem, since the primaary CPU holds this for the duration of onlining

> > > > the secondary CPU.

> > 

> > Looking deeper into that:

> > 

> > secondary_start_kernel()

> >   check_local_cpu_capabilities()

> >     update_cpu_errata_workarounds()

> >       update_cpu_capabilities()

> >         static_key_enable()

> > 	  __static_key_slow_inc()

> > 	    jump_label_lock()

> >        	      mutex_lock(&jump_label_mutex);

> > 

> > How is that supposed to work?

> > 

> > That call path is the low level CPU bringup, running in the context of the

> > idle task of that CPU with interrupts and preemption disabled. Taking a

> > mutex in that context, even if in that case the mutex is uncontended, is a

> > NONO.

> 

> Urgh; good point. Thanks for taking a look.

> 

> I think I can solve both issues by deferring poking the keys, so I'll

> give that a go.

> 

> As an aside, do we have anything that should detect the broken mutex

> usage? I've been testing kernels with LOCKDEP, PROVE_LOCKING,

> DEBUG_ATOMIC_SLEEP, and friends, and nothing has complained so far.


Peter and myself were wondering about that already. No idea why that
doesn't yell at you.

Thanks,

	tglx
Mark Rutland May 11, 2017, 12:54 p.m. UTC | #6
On Thu, May 11, 2017 at 12:01:21PM +0200, Thomas Gleixner wrote:
> On Thu, 11 May 2017, Mark Rutland wrote:

> > On Thu, May 11, 2017 at 10:30:39AM +0200, Thomas Gleixner wrote:

> > > On Wed, 10 May 2017, Thomas Gleixner wrote:

> > > secondary_start_kernel()

> > >   check_local_cpu_capabilities()

> > >     update_cpu_errata_workarounds()

> > >       update_cpu_capabilities()

> > >         static_key_enable()

> > > 	  __static_key_slow_inc()

> > > 	    jump_label_lock()

> > >        	      mutex_lock(&jump_label_mutex);

> > > 

> > > How is that supposed to work?

> > > 

> > > That call path is the low level CPU bringup, running in the context of the

> > > idle task of that CPU with interrupts and preemption disabled. Taking a

> > > mutex in that context, even if in that case the mutex is uncontended, is a

> > > NONO.


> > As an aside, do we have anything that should detect the broken mutex

> > usage? I've been testing kernels with LOCKDEP, PROVE_LOCKING,

> > DEBUG_ATOMIC_SLEEP, and friends, and nothing has complained so far.

> 

> Peter and myself were wondering about that already. No idea why that

> doesn't yell at you.


AFAICT, this happens early enough that system_state is SYSTEM_BOOTING.
In ___might_sleep(), we see this and bail out without a splat.

Thanks,
Mark.