From patchwork Wed Nov 24 18:03:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 517419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD5E4C433F5 for ; Wed, 24 Nov 2021 18:03:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350275AbhKXSGk (ORCPT ); Wed, 24 Nov 2021 13:06:40 -0500 Received: from mail.kernel.org ([198.145.29.99]:56140 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350232AbhKXSGh (ORCPT ); Wed, 24 Nov 2021 13:06:37 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 487906101A; Wed, 24 Nov 2021 18:03:27 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcA-0027r1-Do; Wed, 24 Nov 2021 13:03:26 -0500 Message-ID: <20211124180326.265403840@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:04 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" , stable-rt@vger.kernel.org, Mike Galbraith Subject: [PATCH RT 01/13] mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Mike Galbraith local_lock_t becoming a synonym of spinlock_t had consequences for the RT mods to zsmalloc, which were taking a mutex while holding a local_lock, inspiring a lockdep "BUG: Invalid wait context" gripe. Converting zsmalloc_handle.lock to a spinlock_t restored lockdep silence. Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Steven Rostedt (VMware) --- mm/zsmalloc.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 277d426c881f..3595c1644135 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -82,7 +82,7 @@ struct zsmalloc_handle { unsigned long addr; - struct mutex lock; + spinlock_t lock; }; #define ZS_HANDLE_ALLOC_SIZE (sizeof(struct zsmalloc_handle)) @@ -370,7 +370,7 @@ static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp) if (p) { struct zsmalloc_handle *zh = p; - mutex_init(&zh->lock); + spin_lock_init(&zh->lock); } #endif return (unsigned long)p; @@ -930,7 +930,7 @@ static inline int testpin_tag(unsigned long handle) #ifdef CONFIG_PREEMPT_RT struct zsmalloc_handle *zh = zs_get_pure_handle(handle); - return mutex_is_locked(&zh->lock); + return spin_is_locked(&zh->lock); #else return bit_spin_is_locked(HANDLE_PIN_BIT, (unsigned long *)handle); #endif @@ -941,7 +941,7 @@ static inline int trypin_tag(unsigned long handle) #ifdef CONFIG_PREEMPT_RT struct zsmalloc_handle *zh = zs_get_pure_handle(handle); - return mutex_trylock(&zh->lock); + return spin_trylock(&zh->lock); #else return bit_spin_trylock(HANDLE_PIN_BIT, (unsigned long *)handle); #endif @@ -952,7 +952,7 @@ static void pin_tag(unsigned long handle) __acquires(bitlock) #ifdef CONFIG_PREEMPT_RT struct zsmalloc_handle *zh = zs_get_pure_handle(handle); - return mutex_lock(&zh->lock); + return spin_lock(&zh->lock); #else bit_spin_lock(HANDLE_PIN_BIT, (unsigned long *)handle); #endif @@ -963,7 +963,7 @@ static void unpin_tag(unsigned long handle) __releases(bitlock) #ifdef CONFIG_PREEMPT_RT struct zsmalloc_handle *zh = zs_get_pure_handle(handle); - return mutex_unlock(&zh->lock); + return spin_unlock(&zh->lock); #else bit_spin_unlock(HANDLE_PIN_BIT, (unsigned long *)handle); #endif From patchwork Wed Nov 24 18:03:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 520087 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0A12C433EF for ; Wed, 24 Nov 2021 18:03:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350307AbhKXSGo (ORCPT ); Wed, 24 Nov 2021 13:06:44 -0500 Received: from mail.kernel.org ([198.145.29.99]:56164 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350237AbhKXSGh (ORCPT ); Wed, 24 Nov 2021 13:06:37 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7497961039; Wed, 24 Nov 2021 18:03:27 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcA-0027rZ-JX; Wed, 24 Nov 2021 13:03:26 -0500 Message-ID: <20211124180326.453043583@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:05 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" , stable-rt@vger.kernel.org, "Peter Zijlstra (Intel)" Subject: [PATCH RT 02/13] sched: Fix get_push_task() vs migrate_disable() References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior push_rt_task() attempts to move the currently running task away if the next runnable task has migration disabled and therefore is pinned on the current CPU. The current task is retrieved via get_push_task() which only checks for nr_cpus_allowed == 1, but does not check whether the task has migration disabled and therefore cannot be moved either. The consequence is a pointless invocation of the migration thread which correctly observes that the task cannot be moved. Return NULL if the task has migration disabled and cannot be moved to another CPU. Cc: stable-rt@vger.kernel.org Fixes: a7c81556ec4d3 ("sched: Fix migrate_disable() vs rt/dl balancing") Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210826133738.yiotqbtdaxzjsnfj@linutronix.de Signed-off-by: Steven Rostedt (VMware) --- kernel/sched/sched.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 826ea17e144d..c2c9c386456d 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1949,6 +1949,9 @@ static inline struct task_struct *get_push_task(struct rq *rq) if (p->nr_cpus_allowed == 1) return NULL; + if (p->migration_disabled) + return NULL; + rq->push_busy = true; return get_task_struct(p); } From patchwork Wed Nov 24 18:03:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 517414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB413C433FE for ; Wed, 24 Nov 2021 18:03:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350317AbhKXSGp (ORCPT ); Wed, 24 Nov 2021 13:06:45 -0500 Received: from mail.kernel.org ([198.145.29.99]:56194 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350245AbhKXSGh (ORCPT ); Wed, 24 Nov 2021 13:06:37 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A4CD261041; Wed, 24 Nov 2021 18:03:27 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcA-0027s7-PV; Wed, 24 Nov 2021 13:03:26 -0500 Message-ID: <20211124180326.627400767@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:06 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" , stable-rt@vger.kernel.org Subject: [PATCH RT 03/13] sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior With PREEMPT_RT enabled all hrtimers callbacks will be invoked in softirq mode unless they are explicitly marked as HRTIMER_MODE_HARD. During boot kthread_bind() is used for the creation of per-CPU threads and then hangs in wait_task_inactive() if the ksoftirqd is not yet up and running. The hang disappeared since commit 26c7295be0c5e ("kthread: Do not preempt current task if it is going to call schedule()") but enabling function trace on boot reliably leads to the freeze on boot behaviour again. The timer in wait_task_inactive() can not be directly used by an user interface to abuse it and create a mass wake of several tasks at the same time which would to long sections with disabled interrupts. Therefore it is safe to make the timer HRTIMER_MODE_REL_HARD. Switch the timer to HRTIMER_MODE_REL_HARD. Cc: stable-rt@vger.kernel.org Link: https://lkml.kernel.org/r/20210826170408.vm7rlj7odslshwch@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Steven Rostedt (VMware) --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f638d9420553..54fa3bb1b7c4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2734,7 +2734,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state) ktime_t to = NSEC_PER_SEC / HZ; set_current_state(TASK_UNINTERRUPTIBLE); - schedule_hrtimeout(&to, HRTIMER_MODE_REL); + schedule_hrtimeout(&to, HRTIMER_MODE_REL_HARD); continue; } From patchwork Wed Nov 24 18:03:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 520088 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9069DC433EF for ; Wed, 24 Nov 2021 18:03:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350253AbhKXSGm (ORCPT ); Wed, 24 Nov 2021 13:06:42 -0500 Received: from mail.kernel.org ([198.145.29.99]:56230 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350250AbhKXSGh (ORCPT ); Wed, 24 Nov 2021 13:06:37 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D481561050; Wed, 24 Nov 2021 18:03:27 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcA-0027sf-VX; Wed, 24 Nov 2021 13:03:26 -0500 Message-ID: <20211124180326.814388531@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:07 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" , stable-rt@vger.kernel.org Subject: [PATCH RT 04/13] preempt: Move preempt_enable_no_resched() to the RT block References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior preempt_enable_no_resched() should point to preempt_enable() on PREEMPT_RT so nobody is playing any preempt tricks and enables preemption without checking for the need-resched flag. This was misplaced in v3.14.0-rt1 und remained unnoticed until now. Point preempt_enable_no_resched() and preempt_enable() on RT. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Steven Rostedt (VMware) --- include/linux/preempt.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/preempt.h b/include/linux/preempt.h index af39859f02ee..7b5b2ed55531 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -208,12 +208,12 @@ do { \ preempt_count_dec(); \ } while (0) -#ifdef CONFIG_PREEMPT_RT +#ifndef CONFIG_PREEMPT_RT # define preempt_enable_no_resched() sched_preempt_enable_no_resched() -# define preempt_check_resched_rt() preempt_check_resched() +# define preempt_check_resched_rt() barrier(); #else # define preempt_enable_no_resched() preempt_enable() -# define preempt_check_resched_rt() barrier(); +# define preempt_check_resched_rt() preempt_check_resched() #endif #define preemptible() (preempt_count() == 0 && !irqs_disabled()) From patchwork Wed Nov 24 18:03:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 520083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85112C433F5 for ; Wed, 24 Nov 2021 18:03:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350499AbhKXSG4 (ORCPT ); Wed, 24 Nov 2021 13:06:56 -0500 Received: from mail.kernel.org ([198.145.29.99]:56356 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350225AbhKXSGi (ORCPT ); Wed, 24 Nov 2021 13:06:38 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0E61E61053; Wed, 24 Nov 2021 18:03:28 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcB-0027tD-5G; Wed, 24 Nov 2021 13:03:27 -0500 Message-ID: <20211124180326.998091329@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:08 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" , stable-rt@vger.kernel.org, Mel Gorman Subject: [PATCH RT 05/13] mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior TRANSPARENT_HUGEPAGE: There are potential non-deterministic delays to an RT thread if a critical memory region is not THP-aligned and a non-RT buffer is located in the same hugepage-aligned region. It's also possible for an unrelated thread to migrate pages belonging to an RT task incurring unexpected page faults due to memory defragmentation even if khugepaged is disabled. Regular HUGEPAGEs are not affected by this can be used. NUMA_BALANCING: There is a non-deterministic delay to mark PTEs PROT_NONE to gather NUMA fault samples, increased page faults of regions even if mlocked and non-deterministic delays when migrating pages. [Mel Gorman worded 99% of the commit description]. Link: https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/ Link: https://lore.kernel.org/all/20211026165100.ahz5bkx44lrrw5pt@linutronix.de/ Cc: stable-rt@vger.kernel.org Cc: Mel Gorman Signed-off-by: Sebastian Andrzej Siewior Acked-by: Mel Gorman Link: https://lore.kernel.org/r/20211028143327.hfbxjze7palrpfgp@linutronix.de Signed-off-by: Steven Rostedt (VMware) --- init/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/init/Kconfig b/init/Kconfig index 7ba2b602b707..9bfc60e7eead 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -861,7 +861,7 @@ config NUMA_BALANCING bool "Memory placement aware NUMA scheduler" depends on ARCH_SUPPORTS_NUMA_BALANCING depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY - depends on SMP && NUMA && MIGRATION + depends on SMP && NUMA && MIGRATION && !PREEMPT_RT help This option adds support for automatic NUMA aware memory/task placement. The mechanism is quite primitive and is based on migrating memory when From patchwork Wed Nov 24 18:03:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 520082 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE4F4C433FE for ; Wed, 24 Nov 2021 18:03:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345447AbhKXSHH (ORCPT ); Wed, 24 Nov 2021 13:07:07 -0500 Received: from mail.kernel.org ([198.145.29.99]:56358 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350226AbhKXSGi (ORCPT ); Wed, 24 Nov 2021 13:06:38 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 400EE60FD9; Wed, 24 Nov 2021 18:03:28 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcB-0027tl-BC; Wed, 24 Nov 2021 13:03:27 -0500 Message-ID: <20211124180327.180387388@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:09 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" , Gregor Beck , stable-rt@vger.kernel.org Subject: [PATCH RT 06/13] fscache: Use only one fscache_object_cong_wait. References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior In the commit mentioned below, fscache was converted from slow-work to workqueue. slow_work_enqueue() and slow_work_sleep_till_thread_needed() did not use a per-CPU workqueue. They choose from two global waitqueues depending on the SLOW_WORK_VERY_SLOW bit which was not set so it always one waitqueue. I can't find out how it is ensured that a waiter on certain CPU is woken up be the other side. My guess is that the timeout in schedule_timeout() ensures that it does not wait forever (or a random wake up). fscache_object_sleep_till_congested() must be invoked from preemptible context in order for schedule() to work. In this case this_cpu_ptr() should complain with CONFIG_DEBUG_PREEMPT enabled except the thread is bound to one CPU. wake_up() wakes only one waiter and I'm not sure if it is guaranteed that only one waiter exists. Replace the per-CPU waitqueue with one global waitqueue. Fixes: 8b8edefa2fffb ("fscache: convert object to use workqueue instead of slow-work") Reported-by: Gregor Beck Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Steven Rostedt (VMware) --- fs/fscache/internal.h | 1 - fs/fscache/main.c | 6 ------ fs/fscache/object.c | 11 +++++------ 3 files changed, 5 insertions(+), 13 deletions(-) diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h index 64aa552b296d..7dae569dafb9 100644 --- a/fs/fscache/internal.h +++ b/fs/fscache/internal.h @@ -95,7 +95,6 @@ extern unsigned fscache_debug; extern struct kobject *fscache_root; extern struct workqueue_struct *fscache_object_wq; extern struct workqueue_struct *fscache_op_wq; -DECLARE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait); extern unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n); diff --git a/fs/fscache/main.c b/fs/fscache/main.c index 4207f98e405f..85f8cf3a323d 100644 --- a/fs/fscache/main.c +++ b/fs/fscache/main.c @@ -41,8 +41,6 @@ struct kobject *fscache_root; struct workqueue_struct *fscache_object_wq; struct workqueue_struct *fscache_op_wq; -DEFINE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait); - /* these values serve as lower bounds, will be adjusted in fscache_init() */ static unsigned fscache_object_max_active = 4; static unsigned fscache_op_max_active = 2; @@ -138,7 +136,6 @@ unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n) static int __init fscache_init(void) { unsigned int nr_cpus = num_possible_cpus(); - unsigned int cpu; int ret; fscache_object_max_active = @@ -161,9 +158,6 @@ static int __init fscache_init(void) if (!fscache_op_wq) goto error_op_wq; - for_each_possible_cpu(cpu) - init_waitqueue_head(&per_cpu(fscache_object_cong_wait, cpu)); - ret = fscache_proc_init(); if (ret < 0) goto error_proc; diff --git a/fs/fscache/object.c b/fs/fscache/object.c index cb2146e02cd5..55158f30d093 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -807,6 +807,8 @@ void fscache_object_destroy(struct fscache_object *object) } EXPORT_SYMBOL(fscache_object_destroy); +static DECLARE_WAIT_QUEUE_HEAD(fscache_object_cong_wait); + /* * enqueue an object for metadata-type processing */ @@ -815,12 +817,10 @@ void fscache_enqueue_object(struct fscache_object *object) _enter("{OBJ%x}", object->debug_id); if (fscache_get_object(object, fscache_obj_get_queue) >= 0) { - wait_queue_head_t *cong_wq = - &get_cpu_var(fscache_object_cong_wait); if (queue_work(fscache_object_wq, &object->work)) { if (fscache_object_congested()) - wake_up(cong_wq); + wake_up(&fscache_object_cong_wait); } else fscache_put_object(object, fscache_obj_put_queue); @@ -842,16 +842,15 @@ void fscache_enqueue_object(struct fscache_object *object) */ bool fscache_object_sleep_till_congested(signed long *timeoutp) { - wait_queue_head_t *cong_wq = this_cpu_ptr(&fscache_object_cong_wait); DEFINE_WAIT(wait); if (fscache_object_congested()) return true; - add_wait_queue_exclusive(cong_wq, &wait); + add_wait_queue_exclusive(&fscache_object_cong_wait, &wait); if (!fscache_object_congested()) *timeoutp = schedule_timeout(*timeoutp); - finish_wait(cong_wq, &wait); + finish_wait(&fscache_object_cong_wait, &wait); return fscache_object_congested(); } From patchwork Wed Nov 24 18:03:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 520086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 998E1C433F5 for ; Wed, 24 Nov 2021 18:03:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350340AbhKXSGq (ORCPT ); Wed, 24 Nov 2021 13:06:46 -0500 Received: from mail.kernel.org ([198.145.29.99]:56360 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242387AbhKXSGi (ORCPT ); Wed, 24 Nov 2021 13:06:38 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 68A1661056; Wed, 24 Nov 2021 18:03:28 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcB-0027uL-H8; Wed, 24 Nov 2021 13:03:27 -0500 Message-ID: <20211124180327.365930589@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:10 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" Subject: [PATCH RT 07/13] fscache: Use only one fscache_object_cong_wait (2) References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior This is an update of the original patch, removing put_cpu_var() which was overseen in the initial patch. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Steven Rostedt (VMware) --- fs/fscache/object.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/fscache/object.c b/fs/fscache/object.c index 55158f30d093..fb9794dce721 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -823,8 +823,6 @@ void fscache_enqueue_object(struct fscache_object *object) wake_up(&fscache_object_cong_wait); } else fscache_put_object(object, fscache_obj_put_queue); - - put_cpu_var(fscache_object_cong_wait); } } From patchwork Wed Nov 24 18:03:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 520084 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68F78C433F5 for ; Wed, 24 Nov 2021 18:03:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350453AbhKXSGx (ORCPT ); Wed, 24 Nov 2021 13:06:53 -0500 Received: from mail.kernel.org ([198.145.29.99]:56362 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350102AbhKXSGi (ORCPT ); Wed, 24 Nov 2021 13:06:38 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 931A96104F; Wed, 24 Nov 2021 18:03:28 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcB-0027uu-NR; Wed, 24 Nov 2021 13:03:27 -0500 Message-ID: <20211124180327.553861475@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:11 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" Subject: [PATCH RT 08/13] locking: Drop might_resched() from might_sleep_no_state_check() References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior might_sleep_no_state_check() serves the same purpose as might_sleep() except it is used before sleeping locks are acquired and therefore does not check task_struct::state because the state is preserved. That state is preserved in the locking slow path so we must not schedule at the begin of the locking function because the state will be lost and not preserved at that time. Remove might_resched() from might_sleep_no_state_check() to avoid losing the state before it is preserved. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Steven Rostedt (VMware) --- include/linux/kernel.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2cff7554395d..6eb0ab994f4c 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -222,7 +222,7 @@ extern void __cant_migrate(const char *file, int line); do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0) # define might_sleep_no_state_check() \ - do { ___might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0) + do { ___might_sleep(__FILE__, __LINE__, 0); } while (0) /** * cant_sleep - annotation for functions that cannot sleep From patchwork Wed Nov 24 18:03:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 517417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0508FC4332F for ; Wed, 24 Nov 2021 18:03:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350312AbhKXSGq (ORCPT ); Wed, 24 Nov 2021 13:06:46 -0500 Received: from mail.kernel.org ([198.145.29.99]:56366 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350227AbhKXSGi (ORCPT ); Wed, 24 Nov 2021 13:06:38 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BE0A061052; Wed, 24 Nov 2021 18:03:28 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcB-0027vT-TN; Wed, 24 Nov 2021 13:03:27 -0500 Message-ID: <20211124180327.744571634@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:12 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" , Clark Williams , Maarten Lankhorst Subject: [PATCH RT 09/13] drm/i915/gt: Queue and wait for the irq_work item. References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior Disabling interrupts and invoking the irq_work function directly breaks on PREEMPT_RT. PREEMPT_RT does not invoke all irq_work from hardirq context because some of the user have spinlock_t locking in the callback function. These locks are then turned into a sleeping locks which can not be acquired with disabled interrupts. Using irq_work_queue() has the benefit that the irqwork will be invoked in the regular context. In general there is "no" delay between enqueuing the callback and its invocation because the interrupt is raised right away on architectures which support it (which includes x86). Use irq_work_queue() + irq_work_sync() instead invoking the callback directly. Reported-by: Clark Williams Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Maarten Lankhorst Signed-off-by: Steven Rostedt (VMware) --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 0040b4765a54..3f4f854786f2 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -342,10 +342,9 @@ void intel_breadcrumbs_park(struct intel_breadcrumbs *b) /* Kick the work once more to drain the signalers */ irq_work_sync(&b->irq_work); while (unlikely(READ_ONCE(b->irq_armed))) { - local_irq_disable(); - signal_irq_work(&b->irq_work); - local_irq_enable(); + irq_work_queue(&b->irq_work); cond_resched(); + irq_work_sync(&b->irq_work); } GEM_BUG_ON(!list_empty(&b->signalers)); } From patchwork Wed Nov 24 18:03:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 517413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D72CC433EF for ; Wed, 24 Nov 2021 18:03:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350581AbhKXSHE (ORCPT ); Wed, 24 Nov 2021 13:07:04 -0500 Received: from mail.kernel.org ([198.145.29.99]:56386 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350241AbhKXSGi (ORCPT ); Wed, 24 Nov 2021 13:06:38 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id F083B61058; Wed, 24 Nov 2021 18:03:28 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcC-0027w2-3B; Wed, 24 Nov 2021 13:03:28 -0500 Message-ID: <20211124180327.927893621@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:13 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" , "Peter Zijlstra (Intel)" Subject: [PATCH RT 10/13] irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support. References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior irq_work() triggers instantly an interrupt if supported by the architecture. Otherwise the work will be processed on the next timer tick. In worst case irq_work_sync() could spin up to a jiffy. irq_work_sync() is usually used in tear down context which is fully preemptible. Based on review irq_work_sync() is invoked from preemptible context and there is one waiter at a time. This qualifies it to use rcuwait for synchronisation. Let irq_work_sync() synchronize with rcuwait if the architecture processes irqwork via the timer tick. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20211006111852.1514359-3-bigeasy@linutronix.de Signed-off-by: Steven Rostedt (VMware) --- include/linux/irq_work.h | 10 +++++++++- kernel/irq_work.c | 10 ++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h index f941f2d7d71c..3c6d3a96bca0 100644 --- a/include/linux/irq_work.h +++ b/include/linux/irq_work.h @@ -3,6 +3,7 @@ #define _LINUX_IRQ_WORK_H #include +#include /* * An entry can be in one of four states: @@ -22,6 +23,7 @@ struct irq_work { }; }; void (*func)(struct irq_work *); + struct rcuwait irqwait; }; static inline @@ -29,13 +31,19 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *)) { atomic_set(&work->flags, 0); work->func = func; + rcuwait_init(&work->irqwait); } #define DEFINE_IRQ_WORK(name, _f) struct irq_work name = { \ .flags = ATOMIC_INIT(0), \ - .func = (_f) \ + .func = (_f), \ + .irqwait = __RCUWAIT_INITIALIZER(irqwait), \ } +static inline bool irq_work_is_busy(struct irq_work *work) +{ + return atomic_read(&work->flags) & IRQ_WORK_BUSY; +} bool irq_work_queue(struct irq_work *work); bool irq_work_queue_on(struct irq_work *work, int cpu); diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 8183d30e1bb1..8969aff790e2 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -165,6 +165,9 @@ void irq_work_single(void *arg) */ flags &= ~IRQ_WORK_PENDING; (void)atomic_cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY); + + if (!arch_irq_work_has_interrupt()) + rcuwait_wake_up(&work->irqwait); } static void irq_work_run_list(struct llist_head *list) @@ -231,6 +234,13 @@ void irq_work_tick_soft(void) void irq_work_sync(struct irq_work *work) { lockdep_assert_irqs_enabled(); + might_sleep(); + + if (!arch_irq_work_has_interrupt()) { + rcuwait_wait_event(&work->irqwait, !irq_work_is_busy(work), + TASK_UNINTERRUPTIBLE); + return; + } while (atomic_read(&work->flags) & IRQ_WORK_BUSY) cpu_relax(); From patchwork Wed Nov 24 18:03:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 517415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEB27C433F5 for ; Wed, 24 Nov 2021 18:03:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350367AbhKXSGv (ORCPT ); Wed, 24 Nov 2021 13:06:51 -0500 Received: from mail.kernel.org ([198.145.29.99]:56412 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350266AbhKXSGj (ORCPT ); Wed, 24 Nov 2021 13:06:39 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 243BC61057; Wed, 24 Nov 2021 18:03:29 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcC-0027wb-9G; Wed, 24 Nov 2021 13:03:28 -0500 Message-ID: <20211124180328.118994745@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:14 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" , "Peter Zijlstra (Intel)" Subject: [PATCH RT 11/13] irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior The irq_work callback is invoked in hard IRQ context. By default all callbacks are scheduled for invocation right away (given supported by the architecture) except for the ones marked IRQ_WORK_LAZY which are delayed until the next timer-tick. While looking over the callbacks, some of them may acquire locks (spinlock_t, rwlock_t) which are transformed into sleeping locks on PREEMPT_RT and must not be acquired in hard IRQ context. Changing the locks into locks which could be acquired in this context will lead to other problems such as increased latencies if everything in the chain has IRQ-off locks. This will not solve all the issues as one callback has been noticed which invoked kref_put() and its callback invokes kfree() and this can not be invoked in hardirq context. Some callbacks are required to be invoked in hardirq context even on PREEMPT_RT to work properly. This includes for instance the NO_HZ callback which needs to be able to observe the idle context. The callbacks which require to be run in hardirq have already been marked. Use this information to split the callbacks onto the two lists on PREEMPT_RT: - lazy_list Work items which are not marked with IRQ_WORK_HARD_IRQ will be added to this list. Callbacks on this list will be invoked from a per-CPU thread. The handler here may acquire sleeping locks such as spinlock_t and invoke kfree(). - raised_list Work items which are marked with IRQ_WORK_HARD_IRQ will be added to this list. They will be invoked in hardirq context and must not acquire any sleeping locks. The wake up of the per-CPU thread occurs from irq_work handler/ hardirq context. The thread runs with lowest RT priority to ensure it runs before any SCHED_OTHER tasks do. [bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a hard and soft variant. Collected fixes over time from Steven Rostedt and Mike Galbraith. Move to per-CPU threads instead of softirq as suggested by PeterZ.] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20211007092646.uhshe3ut2wkrcfzv@linutronix.de Signed-off-by: Steven Rostedt (VMware) --- include/linux/irq_work.h | 16 +++-- kernel/irq_work.c | 131 ++++++++++++++++++++++++++++----------- kernel/time/timer.c | 2 - 3 files changed, 106 insertions(+), 43 deletions(-) diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h index 3c6d3a96bca0..f551ba9c99d4 100644 --- a/include/linux/irq_work.h +++ b/include/linux/irq_work.h @@ -40,6 +40,16 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *)) .irqwait = __RCUWAIT_INITIALIZER(irqwait), \ } +#define __IRQ_WORK_INIT(_func, _flags) (struct irq_work){ \ + .flags = ATOMIC_INIT(_flags), \ + .func = (_func), \ + .irqwait = __RCUWAIT_INITIALIZER(irqwait), \ +} + +#define IRQ_WORK_INIT(_func) __IRQ_WORK_INIT(_func, 0) +#define IRQ_WORK_INIT_LAZY(_func) __IRQ_WORK_INIT(_func, IRQ_WORK_LAZY) +#define IRQ_WORK_INIT_HARD(_func) __IRQ_WORK_INIT(_func, IRQ_WORK_HARD_IRQ) + static inline bool irq_work_is_busy(struct irq_work *work) { return atomic_read(&work->flags) & IRQ_WORK_BUSY; @@ -63,10 +73,4 @@ static inline void irq_work_run(void) { } static inline void irq_work_single(void *arg) { } #endif -#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT) -void irq_work_tick_soft(void); -#else -static inline void irq_work_tick_soft(void) { } -#endif - #endif /* _LINUX_IRQ_WORK_H */ diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 8969aff790e2..03d09d779ee1 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -18,12 +18,37 @@ #include #include #include +#include #include #include static DEFINE_PER_CPU(struct llist_head, raised_list); static DEFINE_PER_CPU(struct llist_head, lazy_list); +static DEFINE_PER_CPU(struct task_struct *, irq_workd); + +static void wake_irq_workd(void) +{ + struct task_struct *tsk = __this_cpu_read(irq_workd); + + if (!llist_empty(this_cpu_ptr(&lazy_list)) && tsk) + wake_up_process(tsk); +} + +#ifdef CONFIG_SMP +static void irq_work_wake(struct irq_work *entry) +{ + wake_irq_workd(); +} + +static DEFINE_PER_CPU(struct irq_work, irq_work_wakeup) = + IRQ_WORK_INIT_HARD(irq_work_wake); +#endif + +static int irq_workd_should_run(unsigned int cpu) +{ + return !llist_empty(this_cpu_ptr(&lazy_list)); +} /* * Claim the entry so that no one else will poke at it. @@ -54,20 +79,28 @@ void __weak arch_irq_work_raise(void) static void __irq_work_queue_local(struct irq_work *work) { struct llist_head *list; - bool lazy_work, realtime = IS_ENABLED(CONFIG_PREEMPT_RT); - - lazy_work = atomic_read(&work->flags) & IRQ_WORK_LAZY; - - /* If the work is "lazy", handle it from next tick if any */ - if (lazy_work || (realtime && !(atomic_read(&work->flags) & IRQ_WORK_HARD_IRQ))) + bool rt_lazy_work = false; + bool lazy_work = false; + int work_flags; + + work_flags = atomic_read(&work->flags); + if (work_flags & IRQ_WORK_LAZY) + lazy_work = true; + else if (IS_ENABLED(CONFIG_PREEMPT_RT) && + !(work_flags & IRQ_WORK_HARD_IRQ)) + rt_lazy_work = true; + + if (lazy_work || rt_lazy_work) list = this_cpu_ptr(&lazy_list); else list = this_cpu_ptr(&raised_list); - if (llist_add(&work->llnode, list)) { - if (!lazy_work || tick_nohz_tick_stopped()) - arch_irq_work_raise(); - } + if (!llist_add(&work->llnode, list)) + return; + + /* If the work is "lazy", handle it from next tick if any */ + if (!lazy_work || tick_nohz_tick_stopped()) + arch_irq_work_raise(); } /* Enqueue the irq work @work on the current CPU */ @@ -110,15 +143,27 @@ bool irq_work_queue_on(struct irq_work *work, int cpu) /* Arch remote IPI send/receive backend aren't NMI safe */ WARN_ON_ONCE(in_nmi()); - if (IS_ENABLED(CONFIG_PREEMPT_RT) && !(atomic_read(&work->flags) & IRQ_WORK_HARD_IRQ)) { - if (llist_add(&work->llnode, &per_cpu(lazy_list, cpu))) - arch_send_call_function_single_ipi(cpu); - } else { - __smp_call_single_queue(cpu, &work->llnode); + /* + * On PREEMPT_RT the items which are not marked as + * IRQ_WORK_HARD_IRQ are added to the lazy list and a HARD work + * item is used on the remote CPU to wake the thread. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT) && + !(atomic_read(&work->flags) & IRQ_WORK_HARD_IRQ)) { + + if (!llist_add(&work->llnode, &per_cpu(lazy_list, cpu))) + goto out; + + work = &per_cpu(irq_work_wakeup, cpu); + if (!irq_work_claim(work)) + goto out; } + + __smp_call_single_queue(cpu, &work->llnode); } else { __irq_work_queue_local(work); } +out: preempt_enable(); return true; @@ -175,12 +220,13 @@ static void irq_work_run_list(struct llist_head *list) struct irq_work *work, *tmp; struct llist_node *llnode; -#ifndef CONFIG_PREEMPT_RT /* - * nort: On RT IRQ-work may run in SOFTIRQ context. + * On PREEMPT_RT IRQ-work which is not marked as HARD will be processed + * in a per-CPU thread in preemptible context. Only the items which are + * marked as IRQ_WORK_HARD_IRQ will be processed in hardirq context. */ - BUG_ON(!irqs_disabled()); -#endif + BUG_ON(!irqs_disabled() && !IS_ENABLED(CONFIG_PREEMPT_RT)); + if (llist_empty(list)) return; @@ -196,16 +242,10 @@ static void irq_work_run_list(struct llist_head *list) void irq_work_run(void) { irq_work_run_list(this_cpu_ptr(&raised_list)); - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - /* - * NOTE: we raise softirq via IPI for safety, - * and execute in irq_work_tick() to move the - * overhead from hard to soft irq context. - */ - if (!llist_empty(this_cpu_ptr(&lazy_list))) - raise_softirq(TIMER_SOFTIRQ); - } else + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) irq_work_run_list(this_cpu_ptr(&lazy_list)); + else + wake_irq_workd(); } EXPORT_SYMBOL_GPL(irq_work_run); @@ -218,15 +258,10 @@ void irq_work_tick(void) if (!IS_ENABLED(CONFIG_PREEMPT_RT)) irq_work_run_list(this_cpu_ptr(&lazy_list)); + else + wake_irq_workd(); } -#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT) -void irq_work_tick_soft(void) -{ - irq_work_run_list(this_cpu_ptr(&lazy_list)); -} -#endif - /* * Synchronize against the irq_work @entry, ensures the entry is not * currently in use. @@ -246,3 +281,29 @@ void irq_work_sync(struct irq_work *work) cpu_relax(); } EXPORT_SYMBOL_GPL(irq_work_sync); + +static void run_irq_workd(unsigned int cpu) +{ + irq_work_run_list(this_cpu_ptr(&lazy_list)); +} + +static void irq_workd_setup(unsigned int cpu) +{ + sched_set_fifo_low(current); +} + +static struct smp_hotplug_thread irqwork_threads = { + .store = &irq_workd, + .setup = irq_workd_setup, + .thread_should_run = irq_workd_should_run, + .thread_fn = run_irq_workd, + .thread_comm = "irq_work/%u", +}; + +static __init int irq_work_init_threads(void) +{ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + BUG_ON(smpboot_register_percpu_thread(&irqwork_threads)); + return 0; +} +early_initcall(irq_work_init_threads); diff --git a/kernel/time/timer.c b/kernel/time/timer.c index af3daf03c917..cd67ee6d2634 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1767,8 +1767,6 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h) { struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); - irq_work_tick_soft(); - __run_timers(base); if (IS_ENABLED(CONFIG_NO_HZ_COMMON)) __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF])); From patchwork Wed Nov 24 18:03:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 517416 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A75CC433FE for ; Wed, 24 Nov 2021 18:03:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350382AbhKXSGs (ORCPT ); Wed, 24 Nov 2021 13:06:48 -0500 Received: from mail.kernel.org ([198.145.29.99]:56418 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350271AbhKXSGj (ORCPT ); Wed, 24 Nov 2021 13:06:39 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5951561075; Wed, 24 Nov 2021 18:03:29 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcC-0027xA-FJ; Wed, 24 Nov 2021 13:03:28 -0500 Message-ID: <20211124180328.307061241@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:15 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" , "Peter Zijlstra (Intel)" Subject: [PATCH RT 12/13] irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: Sebastian Andrzej Siewior On PREEMPT_RT most items are processed as LAZY via softirq context. Avoid to spin-wait for them because irq_work_sync() could have higher priority and not allow the irq-work to be completed. Wait additionally for !IRQ_WORK_HARD_IRQ irq_work items on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20211006111852.1514359-5-bigeasy@linutronix.de Signed-off-by: Steven Rostedt (VMware) --- include/linux/irq_work.h | 5 +++++ kernel/irq_work.c | 6 ++++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h index f551ba9c99d4..2c0059340871 100644 --- a/include/linux/irq_work.h +++ b/include/linux/irq_work.h @@ -55,6 +55,11 @@ static inline bool irq_work_is_busy(struct irq_work *work) return atomic_read(&work->flags) & IRQ_WORK_BUSY; } +static inline bool irq_work_is_hard(struct irq_work *work) +{ + return atomic_read(&work->flags) & IRQ_WORK_HARD_IRQ; +} + bool irq_work_queue(struct irq_work *work); bool irq_work_queue_on(struct irq_work *work, int cpu); diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 03d09d779ee1..cbec10c32ead 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -211,7 +211,8 @@ void irq_work_single(void *arg) flags &= ~IRQ_WORK_PENDING; (void)atomic_cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY); - if (!arch_irq_work_has_interrupt()) + if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) || + !arch_irq_work_has_interrupt()) rcuwait_wake_up(&work->irqwait); } @@ -271,7 +272,8 @@ void irq_work_sync(struct irq_work *work) lockdep_assert_irqs_enabled(); might_sleep(); - if (!arch_irq_work_has_interrupt()) { + if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) || + !arch_irq_work_has_interrupt()) { rcuwait_wait_event(&work->irqwait, !irq_work_is_busy(work), TASK_UNINTERRUPTIBLE); return; From patchwork Wed Nov 24 18:03:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 520085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A99AC4332F for ; Wed, 24 Nov 2021 18:03:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350349AbhKXSGt (ORCPT ); Wed, 24 Nov 2021 13:06:49 -0500 Received: from mail.kernel.org ([198.145.29.99]:56446 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350273AbhKXSGj (ORCPT ); Wed, 24 Nov 2021 13:06:39 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 88DC461078; Wed, 24 Nov 2021 18:03:29 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.95) (envelope-from ) id 1mpwcC-0027xj-LG; Wed, 24 Nov 2021 13:03:28 -0500 Message-ID: <20211124180328.497597709@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 24 Nov 2021 13:03:16 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-rt-users Cc: Thomas Gleixner , Carsten Emde , Sebastian Andrzej Siewior , John Kacur , Daniel Wagner , Tom Zanussi , "Srivatsa S. Bhat" Subject: [PATCH RT 13/13] Linux 5.10.78-rt56-rc3 References: <20211124180303.574562279@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org 5.10.78-rt56-rc3 stable review patch. If anyone has any objections, please let me know. ------------------ From: "Steven Rostedt (VMware)" --- localversion-rt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/localversion-rt b/localversion-rt index 51b05e9abe6f..dfd6a5d63829 100644 --- a/localversion-rt +++ b/localversion-rt @@ -1 +1 @@ --rt55 +-rt56-rc3