Message ID | 20220401091006.2100058-1-qiang1.zhang@intel.com |
---|---|
State | New |
Headers | show |
Series | kasan: Fix sleeping function called from invalid context in PREEMPT_RT | expand |
On 2022-04-01 17:10:06 [+0800], Zqiang wrote: > BUG: sleeping function called from invalid context at > kernel/locking/spinlock_rt.c:46 > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: > swapper/0 > preempt_count: 1, expected: 0 > ........... > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt > #22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: > <TASK> > dump_stack_lvl+0x60/0x8c > dump_stack+0x10/0x12 > __might_resched.cold+0x13b/0x173 > rt_spin_lock+0x5b/0xf0 > ___cache_free+0xa5/0x180 > qlist_free_all+0x7a/0x160 > per_cpu_remove_cache+0x5f/0x70 > smp_call_function_many_cond+0x4c4/0x4f0 > on_each_cpu_cond_mask+0x49/0xc0 > kasan_quarantine_remove_cache+0x54/0xf0 > kasan_cache_shrink+0x9/0x10 > kmem_cache_shrink+0x13/0x20 > acpi_os_purge_cache+0xe/0x20 > acpi_purge_cached_objects+0x21/0x6d > acpi_initialize_objects+0x15/0x3b > acpi_init+0x130/0x5ba > do_one_initcall+0xe5/0x5b0 > kernel_init_freeable+0x34f/0x3ad > kernel_init+0x1e/0x140 > ret_from_fork+0x22/0x30 > > When the kmem_cache_shrink() be called, the IPI was triggered, the > ___cache_free() is called in IPI interrupt context, the local lock or > spin lock will be acquired. on PREEMPT_RT kernel, these lock is > replaced with sleepbale rt spin lock, so the above problem is triggered. > fix it by migrating the release action from the IPI interrupt context > to the task context on RT kernel. >I haven't seen that while playing with kasan. Is this new? >Could we fix in a way that we don't involve freeing memory from in-IRQ? >This could trigger a lockdep splat if the local-lock in SLUB is acquired from in-IRQ context on !PREEMPT_RT. Hi, I will move qlist_free_all() from IPI context to task context, This operation and the next release members in the quarantine pool operate similarly I don't know the phenomenon you described. Can you explain it in detail? Thanks Zqiang > Signed-off-by: Zqiang <qiang1.zhang@intel.com> Sebastian
On 2022-04-01 10:10:38 [+0000], Zhang, Qiang1 wrote: > >Could we fix in a way that we don't involve freeing memory from in-IRQ? > >This could trigger a lockdep splat if the local-lock in SLUB is acquired from in-IRQ context on !PREEMPT_RT. > > Hi, I will move qlist_free_all() from IPI context to task context, > This operation and the next release members > in the quarantine pool operate similarly > > I don't know the phenomenon you described. Can you explain it in detail? If you mean by phenomenon my second sentence then the kernel option CONFIG_PROVE_RAW_LOCK_NESTING will trigger on !PREEMPT_RT in a code sequence like raw_spin_lock() spin_lock(); which is wrong on PREEMPT_RT. So we have a warning on both configurations. The call chain in your case will probably not lead to a warning since there is no raw_spinlock_t involved within the IPI call. We worked on avoiding memory allocation and freeing from in-IRQ context therefore I would prefer to have something that works for both and not just ifdef around the RT-case. > Thanks > Zqiang Sebastian
On Fri, 1 Apr 2022 at 11:09, Zqiang <qiang1.zhang@intel.com> wrote: > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 > preempt_count: 1, expected: 0 > ........... > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 > Call Trace: > <TASK> > dump_stack_lvl+0x60/0x8c > dump_stack+0x10/0x12 > __might_resched.cold+0x13b/0x173 > rt_spin_lock+0x5b/0xf0 > ___cache_free+0xa5/0x180 > qlist_free_all+0x7a/0x160 > per_cpu_remove_cache+0x5f/0x70 > smp_call_function_many_cond+0x4c4/0x4f0 > on_each_cpu_cond_mask+0x49/0xc0 > kasan_quarantine_remove_cache+0x54/0xf0 > kasan_cache_shrink+0x9/0x10 > kmem_cache_shrink+0x13/0x20 > acpi_os_purge_cache+0xe/0x20 > acpi_purge_cached_objects+0x21/0x6d > acpi_initialize_objects+0x15/0x3b > acpi_init+0x130/0x5ba > do_one_initcall+0xe5/0x5b0 > kernel_init_freeable+0x34f/0x3ad > kernel_init+0x1e/0x140 > ret_from_fork+0x22/0x30 > > When the kmem_cache_shrink() be called, the IPI was triggered, the > ___cache_free() is called in IPI interrupt context, the local lock > or spin lock will be acquired. on PREEMPT_RT kernel, these lock is > replaced with sleepbale rt spin lock, so the above problem is triggered. > fix it by migrating the release action from the IPI interrupt context > to the task context on RT kernel. > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > --- > mm/kasan/quarantine.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c > index 08291ed33e93..c26fa6473119 100644 > --- a/mm/kasan/quarantine.c > +++ b/mm/kasan/quarantine.c > @@ -90,6 +90,7 @@ static void qlist_move_all(struct qlist_head *from, struct qlist_head *to) > */ > static DEFINE_PER_CPU(struct qlist_head, cpu_quarantine); > > +static DEFINE_PER_CPU(struct qlist_head, cpu_shrink_qlist); > /* Round-robin FIFO array of batches. */ > static struct qlist_head global_quarantine[QUARANTINE_BATCHES]; > static int quarantine_head; > @@ -311,12 +312,14 @@ static void qlist_move_cache(struct qlist_head *from, > static void per_cpu_remove_cache(void *arg) > { > struct kmem_cache *cache = arg; > - struct qlist_head to_free = QLIST_INIT; > + struct qlist_head *to_free; > struct qlist_head *q; > > + to_free = this_cpu_ptr(&cpu_shrink_qlist); > q = this_cpu_ptr(&cpu_quarantine); > - qlist_move_cache(q, &to_free, cache); > - qlist_free_all(&to_free, cache); > + qlist_move_cache(q, to_free, cache); > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > + qlist_free_all(to_free, cache); > } > > /* Free all quarantined objects belonging to cache. */ > @@ -324,6 +327,7 @@ void kasan_quarantine_remove_cache(struct kmem_cache *cache) > { > unsigned long flags, i; > struct qlist_head to_free = QLIST_INIT; > + int cpu; > > /* > * Must be careful to not miss any objects that are being moved from > @@ -334,6 +338,11 @@ void kasan_quarantine_remove_cache(struct kmem_cache *cache) > */ > on_each_cpu(per_cpu_remove_cache, cache, 1); > > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { > + for_each_possible_cpu(cpu) > + qlist_free_all(per_cpu_ptr(&cpu_shrink_qlist, cpu), cache); > + } Hi Zqiang, This code is not protected by any kind of mutex, right? If so, I think it can lead to subtle memory corruptions, double-frees and leaks when several tasks move to/free from cpu_shrink_qlist list. > raw_spin_lock_irqsave(&quarantine_lock, flags); > for (i = 0; i < QUARANTINE_BATCHES; i++) { > if (qlist_empty(&global_quarantine[i]))
On Fri, 2022-04-01 at 11:27 +0200, Sebastian Andrzej Siewior wrote: > On 2022-04-01 17:10:06 [+0800], Zqiang wrote: > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 > > preempt_count: 1, expected: 0 > > ........... > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > > BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x60/0x8c > > dump_stack+0x10/0x12 > > __might_resched.cold+0x13b/0x173 > > rt_spin_lock+0x5b/0xf0 > > ___cache_free+0xa5/0x180 > > qlist_free_all+0x7a/0x160 > > per_cpu_remove_cache+0x5f/0x70 > > smp_call_function_many_cond+0x4c4/0x4f0 > > on_each_cpu_cond_mask+0x49/0xc0 > > kasan_quarantine_remove_cache+0x54/0xf0 > > kasan_cache_shrink+0x9/0x10 > > kmem_cache_shrink+0x13/0x20 > > acpi_os_purge_cache+0xe/0x20 > > acpi_purge_cached_objects+0x21/0x6d > > acpi_initialize_objects+0x15/0x3b > > acpi_init+0x130/0x5ba > > do_one_initcall+0xe5/0x5b0 > > kernel_init_freeable+0x34f/0x3ad > > kernel_init+0x1e/0x140 > > ret_from_fork+0x22/0x30 > > > > When the kmem_cache_shrink() be called, the IPI was triggered, the > > ___cache_free() is called in IPI interrupt context, the local lock > > or spin lock will be acquired. on PREEMPT_RT kernel, these lock is > > replaced with sleepbale rt spin lock, so the above problem is triggered. > > fix it by migrating the release action from the IPI interrupt context > > to the task context on RT kernel. > > I haven't seen that while playing with kasan. Is this new? Don't think so, the rock below was apparently first tossed at 5.12. --- lib/stackdepot.c | 3 +++ mm/kasan/quarantine.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 52 insertions(+) --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -375,6 +375,9 @@ depot_stack_handle_t __stack_depot_save( if (found) goto exit; + if (IS_ENABLED(CONFIG_PREEMPT_RT) && can_alloc && !preemptible()) + can_alloc = false; + /* * Check if the current or the next stack slab need to be initialized. * If so, allocate the memory - we won't be able to do that under the --- a/mm/kasan/quarantine.c +++ b/mm/kasan/quarantine.c @@ -19,6 +19,9 @@ #include <linux/srcu.h> #include <linux/string.h> #include <linux/types.h> +#include <linux/cpu.h> +#include <linux/mutex.h> +#include <linux/workqueue.h> #include <linux/cpuhotplug.h> #include "../slab.h" @@ -319,6 +322,48 @@ static void per_cpu_remove_cache(void *a qlist_free_all(&to_free, cache); } +#ifdef CONFIG_PREEMPT_RT +struct remove_cache_work { + struct work_struct work; + struct kmem_cache *cache; +}; + +static DEFINE_MUTEX(remove_caches_lock); +static DEFINE_PER_CPU(struct remove_cache_work, remove_cache_work); + +static void per_cpu_remove_cache_work(struct work_struct *w) +{ + struct remove_cache_work *rcw; + + rcw = container_of(w, struct remove_cache_work, work); + per_cpu_remove_cache(rcw->cache); +} + +static void per_cpu_remove_caches_sync(struct kmem_cache *cache) +{ + struct remove_cache_work *rcw; + unsigned int cpu; + + cpus_read_lock(); + mutex_lock(&remove_caches_lock); + + for_each_online_cpu(cpu) { + rcw = &per_cpu(remove_cache_work, cpu); + INIT_WORK(&rcw->work, per_cpu_remove_cache_work); + rcw->cache = cache; + schedule_work_on(cpu, &rcw->work); + } + + for_each_online_cpu(cpu) { + rcw = &per_cpu(remove_cache_work, cpu); + flush_work(&rcw->work); + } + + mutex_unlock(&remove_caches_lock); + cpus_read_unlock(); +} +#endif + /* Free all quarantined objects belonging to cache. */ void kasan_quarantine_remove_cache(struct kmem_cache *cache) { @@ -332,7 +377,11 @@ void kasan_quarantine_remove_cache(struc * achieves the first goal, while synchronize_srcu() achieves the * second. */ +#ifndef CONFIG_PREEMPT_RT on_each_cpu(per_cpu_remove_cache, cache, 1); +#else + per_cpu_remove_caches_sync(cache); +#endif raw_spin_lock_irqsave(&quarantine_lock, flags); for (i = 0; i < QUARANTINE_BATCHES; i++) {
diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c index 08291ed33e93..c26fa6473119 100644 --- a/mm/kasan/quarantine.c +++ b/mm/kasan/quarantine.c @@ -90,6 +90,7 @@ static void qlist_move_all(struct qlist_head *from, struct qlist_head *to) */ static DEFINE_PER_CPU(struct qlist_head, cpu_quarantine); +static DEFINE_PER_CPU(struct qlist_head, cpu_shrink_qlist); /* Round-robin FIFO array of batches. */ static struct qlist_head global_quarantine[QUARANTINE_BATCHES]; static int quarantine_head; @@ -311,12 +312,14 @@ static void qlist_move_cache(struct qlist_head *from, static void per_cpu_remove_cache(void *arg) { struct kmem_cache *cache = arg; - struct qlist_head to_free = QLIST_INIT; + struct qlist_head *to_free; struct qlist_head *q; + to_free = this_cpu_ptr(&cpu_shrink_qlist); q = this_cpu_ptr(&cpu_quarantine); - qlist_move_cache(q, &to_free, cache); - qlist_free_all(&to_free, cache); + qlist_move_cache(q, to_free, cache); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + qlist_free_all(to_free, cache); } /* Free all quarantined objects belonging to cache. */ @@ -324,6 +327,7 @@ void kasan_quarantine_remove_cache(struct kmem_cache *cache) { unsigned long flags, i; struct qlist_head to_free = QLIST_INIT; + int cpu; /* * Must be careful to not miss any objects that are being moved from @@ -334,6 +338,11 @@ void kasan_quarantine_remove_cache(struct kmem_cache *cache) */ on_each_cpu(per_cpu_remove_cache, cache, 1); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + for_each_possible_cpu(cpu) + qlist_free_all(per_cpu_ptr(&cpu_shrink_qlist, cpu), cache); + } + raw_spin_lock_irqsave(&quarantine_lock, flags); for (i = 0; i < QUARANTINE_BATCHES; i++) { if (qlist_empty(&global_quarantine[i]))
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 ........... CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x60/0x8c dump_stack+0x10/0x12 __might_resched.cold+0x13b/0x173 rt_spin_lock+0x5b/0xf0 ___cache_free+0xa5/0x180 qlist_free_all+0x7a/0x160 per_cpu_remove_cache+0x5f/0x70 smp_call_function_many_cond+0x4c4/0x4f0 on_each_cpu_cond_mask+0x49/0xc0 kasan_quarantine_remove_cache+0x54/0xf0 kasan_cache_shrink+0x9/0x10 kmem_cache_shrink+0x13/0x20 acpi_os_purge_cache+0xe/0x20 acpi_purge_cached_objects+0x21/0x6d acpi_initialize_objects+0x15/0x3b acpi_init+0x130/0x5ba do_one_initcall+0xe5/0x5b0 kernel_init_freeable+0x34f/0x3ad kernel_init+0x1e/0x140 ret_from_fork+0x22/0x30 When the kmem_cache_shrink() be called, the IPI was triggered, the ___cache_free() is called in IPI interrupt context, the local lock or spin lock will be acquired. on PREEMPT_RT kernel, these lock is replaced with sleepbale rt spin lock, so the above problem is triggered. fix it by migrating the release action from the IPI interrupt context to the task context on RT kernel. Signed-off-by: Zqiang <qiang1.zhang@intel.com> --- mm/kasan/quarantine.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)