Message ID | 20200904064705.239614-1-ulf.hansson@linaro.org |
---|---|
State | Accepted |
Commit | 36050d8984ab743f9990a2eb97a0062fdc3d7bbd |
Headers | show |
Series | cpuidle: psci: Fix suspicious RCU usage | expand |
On Fri, Sep 04, 2020 at 08:47:05AM +0200, Ulf Hansson wrote: > The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to > expose us for tracepoints. This lead to the following RCU splat on an ARM64 > Qcom board. > > [ 5.529634] WARNING: suspicious RCU usage > [ 5.537307] sdhci-pltfm: SDHCI platform and OF driver helper > [ 5.541092] 5.9.0-rc3 #86 Not tainted > [ 5.541098] ----------------------------- > [ 5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! > [ 5.541110] > [ 5.541110] other info that might help us debug this: > [ 5.541110] > [ 5.541116] > [ 5.541116] rcu_scheduler_active = 2, debug_locks = 1 > [ 5.541122] RCU used illegally from extended quiescent state! > [ 5.541129] no locks held by swapper/0/0. > [ 5.541134] > [ 5.541134] stack backtrace: > [ 5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86 > [ 5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > [ 5.541157] Call trace: > [ 5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO > [ 5.574186] dump_backtrace+0x0/0x1c8 > [ 5.574206] show_stack+0x14/0x20 > [ 5.574229] dump_stack+0xe8/0x154 > [ 5.574250] lockdep_rcu_suspicious+0xd4/0xf8 > [ 5.574269] lock_acquire+0x3f0/0x460 > [ 5.574292] _raw_spin_lock_irqsave+0x80/0xb0 > [ 5.574314] __pm_runtime_suspend+0x4c/0x188 > [ 5.574341] psci_enter_domain_idle_state+0x40/0xa0 > [ 5.574362] cpuidle_enter_state+0xc0/0x610 > [ 5.646487] cpuidle_enter+0x38/0x50 > [ 5.650651] call_cpuidle+0x18/0x40 > [ 5.654467] do_idle+0x228/0x278 > [ 5.657678] cpu_startup_entry+0x24/0x70 > [ 5.661153] rest_init+0x1a4/0x278 > [ 5.665061] arch_call_rest_init+0xc/0x14 > [ 5.668272] start_kernel+0x508/0x540 > > Following the path in pm_runtime_put_sync_suspend() from > psci_enter_domain_idle_state(), it seems like we end up using the RCU. > Therefore, let's simply silence the splat by informing the RCU about it > with RCU_NONIDLE. > > Note that, this is a temporary solution. Instead we should strive to avoid > using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit() > further down, closer to the arch specific code. However, as the CPU PM > notifiers are also using the RCU, additional rework is needed. > > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> > --- > drivers/cpuidle/cpuidle-psci.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c > index 74463841805f..d928b37718bd 100644 > --- a/drivers/cpuidle/cpuidle-psci.c > +++ b/drivers/cpuidle/cpuidle-psci.c > @@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, > return -1; > > /* Do runtime PM to manage a hierarchical CPU toplogy. */ > - pm_runtime_put_sync_suspend(pd_dev); > + RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev)); > > state = psci_get_domain_state(); > if (!state) > @@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, > > ret = psci_cpu_suspend_enter(state) ? -1 : idx; > > - pm_runtime_get_sync(pd_dev); > + RCU_NONIDLE(pm_runtime_get_sync(pd_dev)); > > cpu_pm_exit(); > > -- > 2.25.1 >
On Fri, 4 Sep 2020 at 16:13, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Sep 04, 2020 at 08:47:05AM +0200, Ulf Hansson wrote: > > The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to > > expose us for tracepoints. This lead to the following RCU splat on an ARM64 > > Qcom board. > > > > [ 5.529634] WARNING: suspicious RCU usage > > [ 5.537307] sdhci-pltfm: SDHCI platform and OF driver helper > > [ 5.541092] 5.9.0-rc3 #86 Not tainted > > [ 5.541098] ----------------------------- > > [ 5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! > > [ 5.541110] > > [ 5.541110] other info that might help us debug this: > > [ 5.541110] > > [ 5.541116] > > [ 5.541116] rcu_scheduler_active = 2, debug_locks = 1 > > [ 5.541122] RCU used illegally from extended quiescent state! > > [ 5.541129] no locks held by swapper/0/0. > > [ 5.541134] > > [ 5.541134] stack backtrace: > > [ 5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86 > > [ 5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > > [ 5.541157] Call trace: > > [ 5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO > > [ 5.574186] dump_backtrace+0x0/0x1c8 > > [ 5.574206] show_stack+0x14/0x20 > > [ 5.574229] dump_stack+0xe8/0x154 > > [ 5.574250] lockdep_rcu_suspicious+0xd4/0xf8 > > [ 5.574269] lock_acquire+0x3f0/0x460 > > [ 5.574292] _raw_spin_lock_irqsave+0x80/0xb0 > > [ 5.574314] __pm_runtime_suspend+0x4c/0x188 > > [ 5.574341] psci_enter_domain_idle_state+0x40/0xa0 > > [ 5.574362] cpuidle_enter_state+0xc0/0x610 > > [ 5.646487] cpuidle_enter+0x38/0x50 > > [ 5.650651] call_cpuidle+0x18/0x40 > > [ 5.654467] do_idle+0x228/0x278 > > [ 5.657678] cpu_startup_entry+0x24/0x70 > > [ 5.661153] rest_init+0x1a4/0x278 > > [ 5.665061] arch_call_rest_init+0xc/0x14 > > [ 5.668272] start_kernel+0x508/0x540 > > > > Following the path in pm_runtime_put_sync_suspend() from > > psci_enter_domain_idle_state(), it seems like we end up using the RCU. > > Therefore, let's simply silence the splat by informing the RCU about it > > with RCU_NONIDLE. > > > > Note that, this is a temporary solution. Instead we should strive to avoid > > using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit() > > further down, closer to the arch specific code. However, as the CPU PM > > notifiers are also using the RCU, additional rework is needed. > > > > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> > > Acked-by: Paul E. McKenney <paulmck@kernel.org> Peter, Rafael, Is $subject patch okay as the short term solution? If so, I can continue to look at the CPU PM notifiers, etc. Kind regards Uffe > > > --- > > drivers/cpuidle/cpuidle-psci.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c > > index 74463841805f..d928b37718bd 100644 > > --- a/drivers/cpuidle/cpuidle-psci.c > > +++ b/drivers/cpuidle/cpuidle-psci.c > > @@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, > > return -1; > > > > /* Do runtime PM to manage a hierarchical CPU toplogy. */ > > - pm_runtime_put_sync_suspend(pd_dev); > > + RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev)); > > > > state = psci_get_domain_state(); > > if (!state) > > @@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, > > > > ret = psci_cpu_suspend_enter(state) ? -1 : idx; > > > > - pm_runtime_get_sync(pd_dev); > > + RCU_NONIDLE(pm_runtime_get_sync(pd_dev)); > > > > cpu_pm_exit(); > > > > -- > > 2.25.1 > >
On Tue, Sep 15, 2020 at 11:17:20PM +0200, Ulf Hansson wrote: > Peter, Rafael, > > Is $subject patch okay as the short term solution? If so, I can > continue to look at the CPU PM notifiers, etc. Yes, it's a proper band-aid :-)
On Fri, Sep 4, 2020 at 8:47 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to > expose us for tracepoints. This lead to the following RCU splat on an ARM64 > Qcom board. > > [ 5.529634] WARNING: suspicious RCU usage > [ 5.537307] sdhci-pltfm: SDHCI platform and OF driver helper > [ 5.541092] 5.9.0-rc3 #86 Not tainted > [ 5.541098] ----------------------------- > [ 5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! > [ 5.541110] > [ 5.541110] other info that might help us debug this: > [ 5.541110] > [ 5.541116] > [ 5.541116] rcu_scheduler_active = 2, debug_locks = 1 > [ 5.541122] RCU used illegally from extended quiescent state! > [ 5.541129] no locks held by swapper/0/0. > [ 5.541134] > [ 5.541134] stack backtrace: > [ 5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86 > [ 5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > [ 5.541157] Call trace: > [ 5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO > [ 5.574186] dump_backtrace+0x0/0x1c8 > [ 5.574206] show_stack+0x14/0x20 > [ 5.574229] dump_stack+0xe8/0x154 > [ 5.574250] lockdep_rcu_suspicious+0xd4/0xf8 > [ 5.574269] lock_acquire+0x3f0/0x460 > [ 5.574292] _raw_spin_lock_irqsave+0x80/0xb0 > [ 5.574314] __pm_runtime_suspend+0x4c/0x188 > [ 5.574341] psci_enter_domain_idle_state+0x40/0xa0 > [ 5.574362] cpuidle_enter_state+0xc0/0x610 > [ 5.646487] cpuidle_enter+0x38/0x50 > [ 5.650651] call_cpuidle+0x18/0x40 > [ 5.654467] do_idle+0x228/0x278 > [ 5.657678] cpu_startup_entry+0x24/0x70 > [ 5.661153] rest_init+0x1a4/0x278 > [ 5.665061] arch_call_rest_init+0xc/0x14 > [ 5.668272] start_kernel+0x508/0x540 > > Following the path in pm_runtime_put_sync_suspend() from > psci_enter_domain_idle_state(), it seems like we end up using the RCU. > Therefore, let's simply silence the splat by informing the RCU about it > with RCU_NONIDLE. > > Note that, this is a temporary solution. Instead we should strive to avoid > using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit() > further down, closer to the arch specific code. However, as the CPU PM > notifiers are also using the RCU, additional rework is needed. > > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> I missed this one before, sorry about that. Now applied as 5.9-rc6 material with the Paul's ACK, thanks! > --- > drivers/cpuidle/cpuidle-psci.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c > index 74463841805f..d928b37718bd 100644 > --- a/drivers/cpuidle/cpuidle-psci.c > +++ b/drivers/cpuidle/cpuidle-psci.c > @@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, > return -1; > > /* Do runtime PM to manage a hierarchical CPU toplogy. */ > - pm_runtime_put_sync_suspend(pd_dev); > + RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev)); > > state = psci_get_domain_state(); > if (!state) > @@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, > > ret = psci_cpu_suspend_enter(state) ? -1 : idx; > > - pm_runtime_get_sync(pd_dev); > + RCU_NONIDLE(pm_runtime_get_sync(pd_dev)); > > cpu_pm_exit(); > > -- > 2.25.1 >
diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c index 74463841805f..d928b37718bd 100644 --- a/drivers/cpuidle/cpuidle-psci.c +++ b/drivers/cpuidle/cpuidle-psci.c @@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, return -1; /* Do runtime PM to manage a hierarchical CPU toplogy. */ - pm_runtime_put_sync_suspend(pd_dev); + RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev)); state = psci_get_domain_state(); if (!state) @@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, ret = psci_cpu_suspend_enter(state) ? -1 : idx; - pm_runtime_get_sync(pd_dev); + RCU_NONIDLE(pm_runtime_get_sync(pd_dev)); cpu_pm_exit();
The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to expose us for tracepoints. This lead to the following RCU splat on an ARM64 Qcom board. [ 5.529634] WARNING: suspicious RCU usage [ 5.537307] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.541092] 5.9.0-rc3 #86 Not tainted [ 5.541098] ----------------------------- [ 5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.541110] [ 5.541110] other info that might help us debug this: [ 5.541110] [ 5.541116] [ 5.541116] rcu_scheduler_active = 2, debug_locks = 1 [ 5.541122] RCU used illegally from extended quiescent state! [ 5.541129] no locks held by swapper/0/0. [ 5.541134] [ 5.541134] stack backtrace: [ 5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86 [ 5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.541157] Call trace: [ 5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.574186] dump_backtrace+0x0/0x1c8 [ 5.574206] show_stack+0x14/0x20 [ 5.574229] dump_stack+0xe8/0x154 [ 5.574250] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.574269] lock_acquire+0x3f0/0x460 [ 5.574292] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.574314] __pm_runtime_suspend+0x4c/0x188 [ 5.574341] psci_enter_domain_idle_state+0x40/0xa0 [ 5.574362] cpuidle_enter_state+0xc0/0x610 [ 5.646487] cpuidle_enter+0x38/0x50 [ 5.650651] call_cpuidle+0x18/0x40 [ 5.654467] do_idle+0x228/0x278 [ 5.657678] cpu_startup_entry+0x24/0x70 [ 5.661153] rest_init+0x1a4/0x278 [ 5.665061] arch_call_rest_init+0xc/0x14 [ 5.668272] start_kernel+0x508/0x540 Following the path in pm_runtime_put_sync_suspend() from psci_enter_domain_idle_state(), it seems like we end up using the RCU. Therefore, let's simply silence the splat by informing the RCU about it with RCU_NONIDLE. Note that, this is a temporary solution. Instead we should strive to avoid using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit() further down, closer to the arch specific code. However, as the CPU PM notifiers are also using the RCU, additional rework is needed. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> --- drivers/cpuidle/cpuidle-psci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 2.25.1