[0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code

Message ID	cover.1623313323.git.viresh.kumar@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of linux-pm-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; From: Viresh Kumar <viresh.kumar@linaro.org> To: Rafael Wysocki <rjw@rjwysocki.net>, Qian Cai <quic_qiancai@quicinc.com>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Jonathan Corbet <corbet@lwn.net>, Len Brown <lenb@kernel.org>, Michael Ellerman <mpe@ellerman.id.au>, Paul Mackerras <paulus@samba.org>, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>, Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-pm@vger.kernel.org, Vincent Guittot <vincent.guittot@linaro.org>, Ionela Voinescu <ionela.voinescu@arm.com>, Dirk Brandewie <dirk.j.brandewie@intel.com>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code Date: Thu, 10 Jun 2021 13:53:56 +0530 Message-Id: <cover.1623313323.git.viresh.kumar@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	cpufreq: cppc: Fix suspend/resume specific races with FIE code \| expand [0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code [1/5] cpufreq: cppc: Migrate to ->exit() callback instead of ->stop_cpu() [2/5] cpufreq: intel_pstate: Migrate to ->exit() callback instead of ->stop_cpu() [3/5] cpufreq: powerenv: Migrate to ->exit() callback instead of ->stop_cpu() [4/5] cpufreq: Add start_cpu() and stop_cpu() callbacks [5/5] cpufreq: cppc: Fix suspend/resume specific races with the FIE code

Viresh Kumar June 10, 2021, 8:23 a.m. UTC

Hi Qian,

It would be helpful if you can test this patchset and confirm if the races you
mentioned went away or not and that the FIE code works as we wanted it to.

I don't have a real setup and so it won't be easy for me to test this out.

I have already sent a temporary fix for 5.13 and this patchset is targeted for
5.14 and is based over that.

-------------------------8<-------------------------

The CPPC driver currently stops the frequency invariance related
kthread_work and irq_work from cppc_freq_invariance_exit() which is only
called during driver's removal.

This is not sufficient as the CPUs can get hot-plugged out while the
driver is in use, the same also happens during system suspend/resume.

In such a cases we can reach a state where the CPU is removed by the
kernel but its kthread_work or irq_work aren't stopped.

Fix this by implementing the start_cpu() and stop_cpu() callbacks in the
cpufreq core, which will be called for each CPU's addition/removal.

A similar call was already available in the cpufreq core, which isn't required
anymore and so its users are migrated to use exit() callback instead.

This is targeted for v5.14-rc1.

--
Viresh

Viresh Kumar (5):
  cpufreq: cppc: Migrate to ->exit() callback instead of ->stop_cpu()
  cpufreq: intel_pstate: Migrate to ->exit() callback instead of
    ->stop_cpu()
  cpufreq: powerenv: Migrate to ->exit() callback instead of
    ->stop_cpu()
  cpufreq: Add start_cpu() and stop_cpu() callbacks
  cpufreq: cppc: Fix suspend/resume specific races with the FIE code

 Documentation/cpu-freq/cpu-drivers.rst |   7 +-
 drivers/cpufreq/Kconfig.arm            |   1 -
 drivers/cpufreq/cppc_cpufreq.c         | 163 ++++++++++++++-----------
 drivers/cpufreq/cpufreq.c              |  11 +-
 drivers/cpufreq/intel_pstate.c         |   9 +-
 drivers/cpufreq/powernv-cpufreq.c      |  23 ++--
 include/linux/cpufreq.h                |   5 +-
 7 files changed, 119 insertions(+), 100 deletions(-)

-- 
2.31.1.272.g89b43f80a514

Qian Cai June 14, 2021, 1:48 p.m. UTC | #1

On 6/10/2021 4:23 AM, Viresh Kumar wrote:
> Hi Qian,

> 

> It would be helpful if you can test this patchset and confirm if the races you

> mentioned went away or not and that the FIE code works as we wanted it to.

> 

> I don't have a real setup and so it won't be easy for me to test this out.

> 

> I have already sent a temporary fix for 5.13 and this patchset is targeted for

> 5.14 and is based over that.


Unfortunately, this series looks like needing more works.

[  487.773586][    T0] CPU17: Booted secondary processor 0x0000000801 [0x503f0002]
[  487.976495][  T670] list_del corruption. next->prev should be ffff009b66e9ec70, but was ffff009b66dfec70
[  487.987037][  T670] ------------[ cut here ]------------
[  487.992351][  T670] kernel BUG at lib/list_debug.c:54!
[  487.997810][  T670] Internal error: Oops - BUG: 0 [#1] SMP
[  488.003295][  T670] Modules linked in: cpufreq_userspace xfs loop cppc_cpufreq processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class
[  488.021759][  T670] CPU: 1 PID: 670 Comm: cppc_fie Not tainted 5.13.0-rc5-next-20210611+ #46
[  488.030190][  T670] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
[  488.038705][  T670] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--)
[  488.045398][  T670] pc : __list_del_entry_valid+0x154/0x158
[  488.050969][  T670] lr : __list_del_entry_valid+0x154/0x158
[  488.056534][  T670] sp : ffff8000229afd70
[  488.060534][  T670] x29: ffff8000229afd70 x28: ffff0008c8f4f340 x27: dfff800000000000
[  488.068361][  T670] x26: ffff009b66e9ec70 x25: ffff800011c8b4d0 x24: ffff0008d4bfe488
[  488.076188][  T670] x23: ffff0008c8f4f340 x22: ffff0008c8f4f340 x21: ffff009b6789ec70
[  488.084015][  T670] x20: ffff0008d4bfe4c8 x19: ffff009b66e9ec70 x18: ffff0008c8f4fd70
[  488.091842][  T670] x17: 20747562202c3037 x16: 6365396536366239 x15: 0000000000000028
[  488.099669][  T670] x14: 0000000000000000 x13: 0000000000000001 x12: ffff60136cdd3447
[  488.107495][  T670] x11: 1fffe0136cdd3446 x10: ffff60136cdd3446 x9 : ffff8000103ee444
[  488.115322][  T670] x8 : ffff009b66e9a237 x7 : 0000000000000001 x6 : ffff009b66e9a230
[  488.123149][  T670] x5 : 00009fec9322cbba x4 : ffff60136cdd3447 x3 : 1fffe001191e9e69
[  488.130975][  T670] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000054
[  488.138803][  T670] Call trace:
[  488.141935][  T670]  __list_del_entry_valid+0x154/0x158
[  488.147153][  T670]  kthread_worker_fn+0x15c/0xda0
[  488.151939][  T670]  kthread+0x3ac/0x460
[  488.155854][  T670]  ret_from_fork+0x10/0x18
[  488.160120][  T670] Code: 911e8000 aa1303e1 910a0000 941b595b (d4210000)
[  488.166901][  T670] ---[ end trace e637e2d38b2cc087 ]---
[  488.172206][  T670] Kernel panic - not syncing: Oops - BUG: Fatal exception
[  488.179182][  T670] SMP: stopping secondary CPUs
[  489.209347][  T670] SMP: failed to stop secondary CPUs 0-1,10-11,16-17,31
[  489.216128][  T][  T670] Memoryn ]---

> 

> -------------------------8<-------------------------

> 

> The CPPC driver currently stops the frequency invariance related

> kthread_work and irq_work from cppc_freq_invariance_exit() which is only

> called during driver's removal.

> 

> This is not sufficient as the CPUs can get hot-plugged out while the

> driver is in use, the same also happens during system suspend/resume.

> 

> In such a cases we can reach a state where the CPU is removed by the

> kernel but its kthread_work or irq_work aren't stopped.

> 

> Fix this by implementing the start_cpu() and stop_cpu() callbacks in the

> cpufreq core, which will be called for each CPU's addition/removal.

> 

> A similar call was already available in the cpufreq core, which isn't required

> anymore and so its users are migrated to use exit() callback instead.

> 

> This is targeted for v5.14-rc1.

> 

> --

> Viresh

> 

> Viresh Kumar (5):

>   cpufreq: cppc: Migrate to ->exit() callback instead of ->stop_cpu()

>   cpufreq: intel_pstate: Migrate to ->exit() callback instead of

>     ->stop_cpu()

>   cpufreq: powerenv: Migrate to ->exit() callback instead of

>     ->stop_cpu()

>   cpufreq: Add start_cpu() and stop_cpu() callbacks

>   cpufreq: cppc: Fix suspend/resume specific races with the FIE code

> 

>  Documentation/cpu-freq/cpu-drivers.rst |   7 +-

>  drivers/cpufreq/Kconfig.arm            |   1 -

>  drivers/cpufreq/cppc_cpufreq.c         | 163 ++++++++++++++-----------

>  drivers/cpufreq/cpufreq.c              |  11 +-

>  drivers/cpufreq/intel_pstate.c         |   9 +-

>  drivers/cpufreq/powernv-cpufreq.c      |  23 ++--

>  include/linux/cpufreq.h                |   5 +-

>  7 files changed, 119 insertions(+), 100 deletions(-)

>

Viresh Kumar June 15, 2021, 7:50 a.m. UTC | #2

Hi Qian,

First of all thanks for testing this, I need more of your help to test
this out :)

FWIW, I did test this on my Hikey board today, with some hacks, and
tried multiple insmod/rmmod operations for the driver, and I wasn't
able to reproduce the issue you reported. I did enable the list-debug
config option.

On 14-06-21, 09:48, Qian Cai wrote:
> Unfortunately, this series looks like needing more works.

> 

> [  487.773586][    T0] CPU17: Booted secondary processor 0x0000000801 [0x503f0002]

> [  487.976495][  T670] list_del corruption. next->prev should be ffff009b66e9ec70, but was ffff009b66dfec70

> [  487.987037][  T670] ------------[ cut here ]------------

> [  487.992351][  T670] kernel BUG at lib/list_debug.c:54!

> [  487.997810][  T670] Internal error: Oops - BUG: 0 [#1] SMP

> [  488.003295][  T670] Modules linked in: cpufreq_userspace xfs loop cppc_cpufreq processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class

> [  488.021759][  T670] CPU: 1 PID: 670 Comm: cppc_fie Not tainted 5.13.0-rc5-next-20210611+ #46

> [  488.030190][  T670] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020

> [  488.038705][  T670] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--)

> [  488.045398][  T670] pc : __list_del_entry_valid+0x154/0x158

> [  488.050969][  T670] lr : __list_del_entry_valid+0x154/0x158

> [  488.056534][  T670] sp : ffff8000229afd70

> [  488.060534][  T670] x29: ffff8000229afd70 x28: ffff0008c8f4f340 x27: dfff800000000000

> [  488.068361][  T670] x26: ffff009b66e9ec70 x25: ffff800011c8b4d0 x24: ffff0008d4bfe488

> [  488.076188][  T670] x23: ffff0008c8f4f340 x22: ffff0008c8f4f340 x21: ffff009b6789ec70

> [  488.084015][  T670] x20: ffff0008d4bfe4c8 x19: ffff009b66e9ec70 x18: ffff0008c8f4fd70

> [  488.091842][  T670] x17: 20747562202c3037 x16: 6365396536366239 x15: 0000000000000028

> [  488.099669][  T670] x14: 0000000000000000 x13: 0000000000000001 x12: ffff60136cdd3447

> [  488.107495][  T670] x11: 1fffe0136cdd3446 x10: ffff60136cdd3446 x9 : ffff8000103ee444

> [  488.115322][  T670] x8 : ffff009b66e9a237 x7 : 0000000000000001 x6 : ffff009b66e9a230

> [  488.123149][  T670] x5 : 00009fec9322cbba x4 : ffff60136cdd3447 x3 : 1fffe001191e9e69

> [  488.130975][  T670] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000054

> [  488.138803][  T670] Call trace:

> [  488.141935][  T670]  __list_del_entry_valid+0x154/0x158

> [  488.147153][  T670]  kthread_worker_fn+0x15c/0xda0


This is a strange place to get the issue from. And this is a new
issue.

> [  488.151939][  T670]  kthread+0x3ac/0x460

> [  488.155854][  T670]  ret_from_fork+0x10/0x18

> [  488.160120][  T670] Code: 911e8000 aa1303e1 910a0000 941b595b (d4210000)

> [  488.166901][  T670] ---[ end trace e637e2d38b2cc087 ]---

> [  488.172206][  T670] Kernel panic - not syncing: Oops - BUG: Fatal exception

> [  488.179182][  T670] SMP: stopping secondary CPUs

> [  489.209347][  T670] SMP: failed to stop secondary CPUs 0-1,10-11,16-17,31

> [  489.216128][  T][  T670] Memoryn ]---


Can you give details on what exactly did you try to do, to get this ?
Normal boot or something more ?

I have made some changes to the way calls were happening, may get this
thing sorted. Can you please try this branch ?

https://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git/log/?h=cpufreq/cppc

I can see one place where race can happen, i.e. between
topology_clear_scale_freq_source() and topology_scale_freq_tick(). It
is possible that sfd->set_freq_scale() may get called for a previously
set handler as there is no protection there.

I will see how to fix that. But I am not sure if the issue reported
above comes from there.

Anyway, please give my branch a try, lets see.

-- 
viresh

Viresh Kumar June 15, 2021, 9:38 a.m. UTC | #3

On 15-06-21, 13:20, Viresh Kumar wrote:
> I can see one place where race can happen, i.e. between

> topology_clear_scale_freq_source() and topology_scale_freq_tick(). It

> is possible that sfd->set_freq_scale() may get called for a previously

> set handler as there is no protection there.

> 

> I will see how to fix that. But I am not sure if the issue reported

> above comes from there.


I have tried to fix this race and pushed the relevant patch to my
branch. Please pick the latest branch and hopefully everything will
just work.

-- 
viresh

Qian Cai June 15, 2021, 12:17 p.m. UTC | #4

On 6/15/2021 3:50 AM, Viresh Kumar wrote:
> Hi Qian,

> 

> First of all thanks for testing this, I need more of your help to test

> this out :)

> 

> FWIW, I did test this on my Hikey board today, with some hacks, and

> tried multiple insmod/rmmod operations for the driver, and I wasn't

> able to reproduce the issue you reported. I did enable the list-debug

> config option.


The setup here is an arm64 server with 32 CPUs.

> 

> On 14-06-21, 09:48, Qian Cai wrote:

>> Unfortunately, this series looks like needing more works.

>>

>> [  487.773586][    T0] CPU17: Booted secondary processor 0x0000000801 [0x503f0002]

>> [  487.976495][  T670] list_del corruption. next->prev should be ffff009b66e9ec70, but was ffff009b66dfec70

>> [  487.987037][  T670] ------------[ cut here ]------------

>> [  487.992351][  T670] kernel BUG at lib/list_debug.c:54!

>> [  487.997810][  T670] Internal error: Oops - BUG: 0 [#1] SMP

>> [  488.003295][  T670] Modules linked in: cpufreq_userspace xfs loop cppc_cpufreq processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class

>> [  488.021759][  T670] CPU: 1 PID: 670 Comm: cppc_fie Not tainted 5.13.0-rc5-next-20210611+ #46

>> [  488.030190][  T670] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020

>> [  488.038705][  T670] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--)

>> [  488.045398][  T670] pc : __list_del_entry_valid+0x154/0x158

>> [  488.050969][  T670] lr : __list_del_entry_valid+0x154/0x158

>> [  488.056534][  T670] sp : ffff8000229afd70

>> [  488.060534][  T670] x29: ffff8000229afd70 x28: ffff0008c8f4f340 x27: dfff800000000000

>> [  488.068361][  T670] x26: ffff009b66e9ec70 x25: ffff800011c8b4d0 x24: ffff0008d4bfe488

>> [  488.076188][  T670] x23: ffff0008c8f4f340 x22: ffff0008c8f4f340 x21: ffff009b6789ec70

>> [  488.084015][  T670] x20: ffff0008d4bfe4c8 x19: ffff009b66e9ec70 x18: ffff0008c8f4fd70

>> [  488.091842][  T670] x17: 20747562202c3037 x16: 6365396536366239 x15: 0000000000000028

>> [  488.099669][  T670] x14: 0000000000000000 x13: 0000000000000001 x12: ffff60136cdd3447

>> [  488.107495][  T670] x11: 1fffe0136cdd3446 x10: ffff60136cdd3446 x9 : ffff8000103ee444

>> [  488.115322][  T670] x8 : ffff009b66e9a237 x7 : 0000000000000001 x6 : ffff009b66e9a230

>> [  488.123149][  T670] x5 : 00009fec9322cbba x4 : ffff60136cdd3447 x3 : 1fffe001191e9e69

>> [  488.130975][  T670] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000054

>> [  488.138803][  T670] Call trace:

>> [  488.141935][  T670]  __list_del_entry_valid+0x154/0x158

>> [  488.147153][  T670]  kthread_worker_fn+0x15c/0xda0

> 

> This is a strange place to get the issue from. And this is a new

> issue.


Well, it was still the same exercises with CPU online/offline.

> 

>> [  488.151939][  T670]  kthread+0x3ac/0x460

>> [  488.155854][  T670]  ret_from_fork+0x10/0x18

>> [  488.160120][  T670] Code: 911e8000 aa1303e1 910a0000 941b595b (d4210000)

>> [  488.166901][  T670] ---[ end trace e637e2d38b2cc087 ]---

>> [  488.172206][  T670] Kernel panic - not syncing: Oops - BUG: Fatal exception

>> [  488.179182][  T670] SMP: stopping secondary CPUs

>> [  489.209347][  T670] SMP: failed to stop secondary CPUs 0-1,10-11,16-17,31

>> [  489.216128][  T][  T670] Memoryn ]---

> 

> Can you give details on what exactly did you try to do, to get this ?

> Normal boot or something more ?


Basically, it has the cpufreq driver as CPPC and the governor as schedutil. Running a few workloads to get CPU scaling up and down. Later, try to offline all CPUs until the last one and then online all CPUs.

> 

> I have made some changes to the way calls were happening, may get this

> thing sorted. Can you please try this branch ?

> 

> https://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git/log/?h=cpufreq/cppc

> 

> I can see one place where race can happen, i.e. between

> topology_clear_scale_freq_source() and topology_scale_freq_tick(). It

> is possible that sfd->set_freq_scale() may get called for a previously

> set handler as there is no protection there.

> 

> I will see how to fix that. But I am not sure if the issue reported

> above comes from there.

> 

> Anyway, please give my branch a try, lets see.


I am hesitate to try this at the moment because this all feel like shooting in the dark. Ideally, you will be able to get access to one of those arm64 servers (Huawei, Ampere, TX2, FJ etc) eventually and really try the same exercises yourself with those debugging options like list debugging and KASAN on. That way you could fix things way efficiently. I could share you the .config once you are there. Last but not least, once you get better narrow down of the issues, I'd hope to see someone else familiar with the code there to get review of those patches first (feel free to Cc me once you are ready to post) before I'll rerun the whole things again. That way we don't waste time on each other backing and forth chasing the shadow.

Viresh Kumar June 16, 2021, 4:57 a.m. UTC | #5

On 15-06-21, 08:17, Qian Cai wrote:
> On 6/15/2021 3:50 AM, Viresh Kumar wrote:

> > This is a strange place to get the issue from. And this is a new

> > issue.

> 

> Well, it was still the same exercises with CPU online/offline.

> 

> > 

> >> [  488.151939][  T670]  kthread+0x3ac/0x460

> >> [  488.155854][  T670]  ret_from_fork+0x10/0x18

> >> [  488.160120][  T670] Code: 911e8000 aa1303e1 910a0000 941b595b (d4210000)

> >> [  488.166901][  T670] ---[ end trace e637e2d38b2cc087 ]---

> >> [  488.172206][  T670] Kernel panic - not syncing: Oops - BUG: Fatal exception

> >> [  488.179182][  T670] SMP: stopping secondary CPUs

> >> [  489.209347][  T670] SMP: failed to stop secondary CPUs 0-1,10-11,16-17,31

> >> [  489.216128][  T][  T670] Memoryn ]---

> > 

> > Can you give details on what exactly did you try to do, to get this ?

> > Normal boot or something more ?

> 

> Basically, it has the cpufreq driver as CPPC and the governor as

> schedutil. Running a few workloads to get CPU scaling up and down.

> Later, try to offline all CPUs until the last one and then online

> all CPUs.

Hmm, okay.

So I basically have very similar setup with 8 cores (1-policy
per-cpu), the only difference is I don't end up reading the
performance counters, everything else remains same. So I should see
issues now just like you, in case there are any.

Since the insmod/rmmod setup is a bit different, this is what I tried
today for around an hour with CONFIG_DEBUG_LIST and RCU debugging
options.

while true; do
    for i in `seq 1 7`;
    do
        echo 0 > /sys/devices/system/cpu/cpu$i/online;
    done;

    for i in `seq 1 7`;
    do
        echo 1 > /sys/devices/system/cpu/cpu$i/online;
    done;
done

I don't see any crashes, oops or warnings with latest stuff.

> I am hesitate to try this at the moment because this all feel like

> shooting in the dark.

I understand your point and you aren't completely wrong here. It
wasn't completely in dark but since I am unable to reproduce the issue
at my end, I asked for help.

FWIW, I think one of the possible cause of corruption of kthread thing
could have been because of the race in the topology related code. I
already fixed that in my tree yesterday.

> Ideally, you will be able to get access to one

> of those arm64 servers (Huawei, Ampere, TX2, FJ etc) eventually and

> really try the same exercises yourself with those debugging options

> like list debugging and KASAN on. That way you could fix things way

> efficiently.

Yeah, I thought of this work being over and I am not a user of it
normally. I had to enable it for ARM servers and I took help of my
colleagues (Vincent Guittot and Ionela) for testing the same.

I have also asked Vincent to give it a try again.

> I could share you the .config once you are there. Last

> but not least, once you get better narrow down of the issues, I'd

> hope to see someone else familiar with the code there to get review

> of those patches first (feel free to Cc me once you are ready to

> post) before I'll rerun the whole things again. That way we don't

> waste time on each other backing and forth chasing the shadow.

I did send the stuff up for review and this last thing (you reported)
was a different race altogether, so asked for testing without reviews.

Anyway, I am quite sure my tests have covered such issues now. I will
send out patches again soon.

Thanks Qian.

-- 
viresh

[0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code

Message

Comments