Message ID | 20241001-b4-slub-kunit-fix-v2-0-2d995d3ecb49@suse.cz |
---|---|
Headers | show |
Series | slub kunit tests fixes for 6.12 | expand |
On 10/1/24 18:20, Vlastimil Babka wrote: > Guenter Roeck reports that the new slub kunit tests added by commit > 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and > test_leak_destroy()") cause a lockup on boot on several architectures > when the kunit tests are configured to be built-in and not modules. > > The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection > showed the runner for built-in kunit tests kunit_run_all_tests() is > called before setting system_state to SYSTEM_RUNNING and calling > rcu_end_inkernel_boot(), so this seems like a likely cause. So while I > was unable to reproduce the problem myself, skipping the test when the > slub_kunit module is built-in should avoid the issue. > > An alternative fix that was moving the call to kunit_run_all_tests() a > bit later in the boot was tried, but has broken tests with functions > marked as __init due to free_initmem() already being done. > > Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") > Reported-by: Guenter Roeck <linux@roeck-us.net> > Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/ I hope you can confirm it helps, because the commit added two tests and I've only skipped one of them, as it's the one using kfree_rcu(), which is suspected. But the other is responsible for the (now suppressed) kmem_cache_destroy() warning, and maybe I'm missing something and it was actually that one causing the lockups. Since you mentioned the boot lockups happened on some x86_64 too, do you have a .config of the lockup case? I've tried tweaking some rcu options but still nothing. Thanks! > Cc: "Paul E. McKenney" <paulmck@kernel.org> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Uladzislau Rezki <urezki@gmail.com> > Cc: rcu@vger.kernel.org > Cc: Brendan Higgins <brendanhiggins@google.com> > Cc: David Gow <davidgow@google.com> > Cc: Rae Moar <rmoar@google.com> > Cc: linux-kselftest@vger.kernel.org > Cc: kunit-dev@googlegroups.com > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > lib/slub_kunit.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c > index 85d51ec09846d4fa219db6bda336c6f0b89e98e4..80e39f003344858722a544ad62ed84e885574054 100644 > --- a/lib/slub_kunit.c > +++ b/lib/slub_kunit.c > @@ -164,10 +164,16 @@ struct test_kfree_rcu_struct { > > static void test_kfree_rcu(struct kunit *test) > { > - struct kmem_cache *s = test_kmem_cache_create("TestSlub_kfree_rcu", > - sizeof(struct test_kfree_rcu_struct), > - SLAB_NO_MERGE); > - struct test_kfree_rcu_struct *p = kmem_cache_alloc(s, GFP_KERNEL); > + struct kmem_cache *s; > + struct test_kfree_rcu_struct *p; > + > + if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST)) > + kunit_skip(test, "can't do kfree_rcu() when test is built-in"); > + > + s = test_kmem_cache_create("TestSlub_kfree_rcu", > + sizeof(struct test_kfree_rcu_struct), > + SLAB_NO_MERGE); > + p = kmem_cache_alloc(s, GFP_KERNEL); > > kfree_rcu(p, rcu); > kmem_cache_destroy(s); >
On 10/1/24 09:20, Vlastimil Babka wrote: > Guenter Roeck reports that the new slub kunit tests added by commit > 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and > test_leak_destroy()") cause a lockup on boot on several architectures > when the kunit tests are configured to be built-in and not modules. > > The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection > showed the runner for built-in kunit tests kunit_run_all_tests() is > called before setting system_state to SYSTEM_RUNNING and calling > rcu_end_inkernel_boot(), so this seems like a likely cause. So while I > was unable to reproduce the problem myself, skipping the test when the > slub_kunit module is built-in should avoid the issue. > > An alternative fix that was moving the call to kunit_run_all_tests() a > bit later in the boot was tried, but has broken tests with functions > marked as __init due to free_initmem() already being done. > > Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") > Reported-by: Guenter Roeck <linux@roeck-us.net> > Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/ > Cc: "Paul E. McKenney" <paulmck@kernel.org> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Uladzislau Rezki <urezki@gmail.com> > Cc: rcu@vger.kernel.org > Cc: Brendan Higgins <brendanhiggins@google.com> > Cc: David Gow <davidgow@google.com> > Cc: Rae Moar <rmoar@google.com> > Cc: linux-kselftest@vger.kernel.org > Cc: kunit-dev@googlegroups.com > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> This results in: KTAP version 1 # Subtest: slub_test # module: slub_kunit 1..8 # test_clobber_zone: pass:1 fail:0 skip:0 total:1 ok 1 test_clobber_zone # test_next_pointer: pass:1 fail:0 skip:0 total:1 ok 2 test_next_pointer # test_first_word: pass:1 fail:0 skip:0 total:1 ok 3 test_first_word # test_clobber_50th_byte: pass:1 fail:0 skip:0 total:1 ok 4 test_clobber_50th_byte # test_clobber_redzone_free: pass:1 fail:0 skip:0 total:1 ok 5 test_clobber_redzone_free # test_kmalloc_redzone_access: pass:1 fail:0 skip:0 total:1 ok 6 test_kmalloc_redzone_access # test_kfree_rcu: pass:0 fail:0 skip:1 total:1 ok 7 test_kfree_rcu # SKIP can't do kfree_rcu() when test is built-in # test_leak_destroy: pass:1 fail:0 skip:0 total:1 ok 8 test_leak_destroy # slub_test: pass:7 fail:0 skip:1 total:8 Tested-by: Guenter Roeck <linux@roeck-us.net> Thanks, Guenter
On 10/2/24 03:26, Vlastimil Babka wrote: > On 10/1/24 18:20, Vlastimil Babka wrote: >> Guenter Roeck reports that the new slub kunit tests added by commit >> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and >> test_leak_destroy()") cause a lockup on boot on several architectures >> when the kunit tests are configured to be built-in and not modules. >> >> The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection >> showed the runner for built-in kunit tests kunit_run_all_tests() is >> called before setting system_state to SYSTEM_RUNNING and calling >> rcu_end_inkernel_boot(), so this seems like a likely cause. So while I >> was unable to reproduce the problem myself, skipping the test when the >> slub_kunit module is built-in should avoid the issue. >> >> An alternative fix that was moving the call to kunit_run_all_tests() a >> bit later in the boot was tried, but has broken tests with functions >> marked as __init due to free_initmem() already being done. >> >> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") >> Reported-by: Guenter Roeck <linux@roeck-us.net> >> Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/ > > I hope you can confirm it helps, because the commit added two tests and I've > only skipped one of them, as it's the one using kfree_rcu(), which is > suspected. But the other is responsible for the (now suppressed) > kmem_cache_destroy() warning, and maybe I'm missing something and it was > actually that one causing the lockups. > Everything works with your patches applied, so we are good. > Since you mentioned the boot lockups happened on some x86_64 too, do you > have a .config of the lockup case? I've tried tweaking some rcu options but > still nothing. > I have a bunch of debug options enabled. Configuration (generated using "make savedefconfig") for x86_64 is attached. Thanks, Guenter
On 10/2/24 15:52, Guenter Roeck wrote: > On 10/2/24 03:26, Vlastimil Babka wrote: >> On 10/1/24 18:20, Vlastimil Babka wrote: >>> Guenter Roeck reports that the new slub kunit tests added by commit >>> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and >>> test_leak_destroy()") cause a lockup on boot on several architectures >>> when the kunit tests are configured to be built-in and not modules. >>> >>> The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection >>> showed the runner for built-in kunit tests kunit_run_all_tests() is >>> called before setting system_state to SYSTEM_RUNNING and calling >>> rcu_end_inkernel_boot(), so this seems like a likely cause. So while I >>> was unable to reproduce the problem myself, skipping the test when the >>> slub_kunit module is built-in should avoid the issue. >>> >>> An alternative fix that was moving the call to kunit_run_all_tests() a >>> bit later in the boot was tried, but has broken tests with functions >>> marked as __init due to free_initmem() already being done. >>> >>> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") >>> Reported-by: Guenter Roeck <linux@roeck-us.net> >>> Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/ >> >> I hope you can confirm it helps, because the commit added two tests and I've >> only skipped one of them, as it's the one using kfree_rcu(), which is >> suspected. But the other is responsible for the (now suppressed) >> kmem_cache_destroy() warning, and maybe I'm missing something and it was >> actually that one causing the lockups. >> > > Everything works with your patches applied, so we are good. Thanks for testing! Queued for -next now and will send to Linus later if all's good. >> Since you mentioned the boot lockups happened on some x86_64 too, do you >> have a .config of the lockup case? I've tried tweaking some rcu options but >> still nothing. >> > > I have a bunch of debug options enabled. Configuration (generated using > "make savedefconfig") for x86_64 is attached. Hmm, didn't see the hang with that (using virtme-ng) on v6.12-rc1. Guess there's something more to it. Oh well. > Thanks, > Guenter
The SLUB changes for 6.12 included new kunit tests that resulted in noisy warnings, which we normally suppress, and a boot lockup in some configurations in case the kunit tests are built-in. The warnings are addressed in Patch 1. The lockups I couldn't reproduce, but inspecting boot initialization order makes me suspect the test_kfree_rcu() calling kfree_rcu() which is too early before RCU finishes initialization. Moving the exection later was tried but broke tests marking their code as __init so Patch 2 skips the test when the slub kunit tests are built-in. So these are now fixes for 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") The plan is to take the fixes via slab tree for a 6.12 rcX. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- Changes in v2: - patch 2 skips the test when built-in instead of moving kunit execution later - Link to v1: https://lore.kernel.org/r/20240930-b4-slub-kunit-fix-v1-0-32ca9dbbbc11@suse.cz --- Vlastimil Babka (2): mm, slab: suppress warnings in test_leak_destroy kunit test slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in lib/slub_kunit.c | 18 ++++++++++++------ mm/slab.h | 6 ++++++ mm/slab_common.c | 5 +++-- mm/slub.c | 5 +++-- 4 files changed, 24 insertions(+), 10 deletions(-) --- base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc change-id: 20240930-b4-slub-kunit-fix-6fba4d1c1742 Best regards,