From patchwork Thu Dec 15 18:42:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srinivas Pandruvada X-Patchwork-Id: 634740 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEBF6C25B04 for ; Thu, 15 Dec 2022 18:43:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229846AbiLOSnb (ORCPT ); Thu, 15 Dec 2022 13:43:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229883AbiLOSna (ORCPT ); Thu, 15 Dec 2022 13:43:30 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BFA620343; Thu, 15 Dec 2022 10:43:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1671129809; x=1702665809; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ELcbuLbz+D9wrsIJ2hLNbj73bW27WcmKW6Jeb1abta4=; b=WtD5eTpnCIHaoBsh0H/xfoHHNhS5qC8i1Vxxp1KziCpbe5938V+EIMNB qfeNhb1UL+0lqOv0T5cuYAN71LwVqqZIVeLn+RlpyqZ5t6MYncpqO041o YEvqzsI9jS06BXYlIOfhmkpSbVJYajSK+TjWpIYr25o+6dKQYHh/gLRLX jJGL7RZXQOzDdDyZTfW30t88pLsaVBN331hsITqECGkKDh4l1/fVoIyT8 SEOZTBhnWmQSSGVjv3XTWFNVgRsbO3S5SVxh8LDf8oWCNVGpTmlWAmpsO N3dKxwcFTRoSa1lJPOR0+PckSL4rB+M5p46v6iRQeTgOZoBZ1tJTsqsdp g==; X-IronPort-AV: E=McAfee;i="6500,9779,10562"; a="299112641" X-IronPort-AV: E=Sophos;i="5.96,248,1665471600"; d="scan'208";a="299112641" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2022 10:43:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10562"; a="756442192" X-IronPort-AV: E=Sophos;i="5.96,248,1665471600"; d="scan'208";a="756442192" Received: from spandruv-desk.jf.intel.com ([10.54.75.8]) by fmsmga002.fm.intel.com with ESMTP; 15 Dec 2022 10:43:27 -0800 From: Srinivas Pandruvada To: rafael@kernel.org, peterz@infradead.org, frederic@kernel.org, daniel.lezcano@linaro.org Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, len.brown@intel.com, Srinivas Pandruvada Subject: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq Date: Thu, 15 Dec 2022 10:42:59 -0800 Message-Id: <20221215184300.1592872-2-srinivas.pandruvada@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20221215184300.1592872-1-srinivas.pandruvada@linux.intel.com> References: <20221215184300.1592872-1-srinivas.pandruvada@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org From: Frederic Weisbecker It is possible that ksoftirqd is woken up and racing with forced idle injection via play_idle_precise() called from a FIFO thread. Here it is possible that need_resched() is false but ksoftirqd is pending. This will result in printing of warning: [147777.095484] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!! Also the processing can be delayed upto the duration of forced idle. For example in the following traces (with added traces for debug): -0 [004] 207.087742: sched_wakeup: ksoftirqd/4 -0 [004] 207.087743: sched_switch: swapper/4:0 [120] R ==> idle_inject/4:37 [49] idle_inject/4-37 [004] 207.087744: play_idle_enter idle_inject/4-37 [004] 207.087746: bputs: can_stop_idle_tick.isra.16: soft irq pending (Printed warning at this point NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!! ... ... idle_inject/4-37 [004] 207.112127: play_idle_exit: state=24000000 cpu_id=4 idle_inject/4-37 [004] 207.112134: sched_switch: idle_inject/4:37 [49] S ==> ksoftirqd/4:39 [120] ksoftirqd/4-39 [004] 207.112142: softirq_entry: vec=3 [action=NET_RX] ksoftirqd/4-39 [004] 207.112213: irq_handler_entry: irq=142 name=enp0s Delayed by : 207.112142 - 207.087742 = 0.024400 (250HZ configuration) and the idle duration for play_idle_precise is 24ms. So, the softirq entered after the expiry of forced idle time. The solution here is to give chance to run softirq. Currently do_idle() checks for need_resched() and break. But here in addition to need_resched() also added a check for task_is_running(__this_cpu_read(ksoftirqd)). So if the ksoftirq is pending break the do_idle() loop and give 1 jiffie to process via schedule_timeout(1). If the required idle time is not expired, start hrtimer again and call do_idle() again. Signed-off-by: Frederic Weisbecker Signed-off-by: Srinivas Pandruvada --- kernel/sched/idle.c | 55 ++++++++++++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 18 deletions(-) diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index f26ab2675f7d..77d6168288cf 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -49,6 +49,11 @@ static int __init cpu_idle_nopoll_setup(char *__unused) __setup("hlt", cpu_idle_nopoll_setup); #endif +static bool __cpuidle idle_loop_should_break(void) +{ + return need_resched() || task_is_running(__this_cpu_read(ksoftirqd)); +} + static noinline int __cpuidle cpu_idle_poll(void) { trace_cpu_idle(0, smp_processor_id()); @@ -56,7 +61,7 @@ static noinline int __cpuidle cpu_idle_poll(void) ct_idle_enter(); local_irq_enable(); - while (!tif_need_resched() && + while (!idle_loop_should_break() && (cpu_idle_force_poll || tick_check_broadcast_expired())) cpu_relax(); @@ -174,7 +179,7 @@ static void cpuidle_idle_call(void) * Check if the idle task must be rescheduled. If it is the * case, exit the function after re-enabling the local irq. */ - if (need_resched()) { + if (idle_loop_should_break()) { local_irq_enable(); return; } @@ -276,7 +281,7 @@ static void do_idle(void) __current_set_polling(); tick_nohz_idle_enter(); - while (!need_resched()) { + while (!idle_loop_should_break()) { rmb(); local_irq_disable(); @@ -358,6 +363,7 @@ static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *timer) void play_idle_precise(u64 duration_ns, u64 latency_ns) { struct idle_timer it; + ktime_t remaining; /* * Only FIFO tasks can disable the tick since they don't need the forced @@ -370,25 +376,38 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns) WARN_ON_ONCE(!duration_ns); WARN_ON_ONCE(current->mm); - rcu_sleep_check(); - preempt_disable(); - current->flags |= PF_IDLE; - cpuidle_use_deepest_state(latency_ns); + remaining = ns_to_ktime(duration_ns); - it.done = 0; - hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); - it.timer.function = idle_inject_timer_fn; - hrtimer_start(&it.timer, ns_to_ktime(duration_ns), - HRTIMER_MODE_REL_PINNED_HARD); + do { + rcu_sleep_check(); + preempt_disable(); + current->flags |= PF_IDLE; + cpuidle_use_deepest_state(latency_ns); - while (!READ_ONCE(it.done)) - do_idle(); + it.done = 0; + hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); + it.timer.function = idle_inject_timer_fn; + hrtimer_start(&it.timer, remaining, HRTIMER_MODE_REL_PINNED_HARD); + + while (!READ_ONCE(it.done) && !task_is_running(__this_cpu_read(ksoftirqd))) + do_idle(); - cpuidle_use_deepest_state(0); - current->flags &= ~PF_IDLE; + remaining = hrtimer_get_remaining(&it.timer); - preempt_fold_need_resched(); - preempt_enable(); + hrtimer_cancel(&it.timer); + + cpuidle_use_deepest_state(0); + current->flags &= ~PF_IDLE; + + preempt_fold_need_resched(); + preempt_enable(); + + /* Give ksoftirqd 1 jiffy to get a chance to start its job */ + if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd))) { + __set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(1); + } + } while (!READ_ONCE(it.done)); } EXPORT_SYMBOL_GPL(play_idle_precise); From patchwork Thu Dec 15 18:43:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srinivas Pandruvada X-Patchwork-Id: 634348 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51068C46467 for ; Thu, 15 Dec 2022 18:43:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229850AbiLOSnc (ORCPT ); Thu, 15 Dec 2022 13:43:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229975AbiLOSna (ORCPT ); Thu, 15 Dec 2022 13:43:30 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C55C20BEC; Thu, 15 Dec 2022 10:43:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1671129809; x=1702665809; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=alyrCRpg7VFefOZMHRABhidPtpezk2Hp/kqcUajm0NM=; b=AONdNd28/CS/Exz0LcHeyhZMxNxS1k6F018I0F7wdvj9cAUECAi73cwg WK1nuN+306ej0rzqhjrfZ6WnnTfRQfTFW5tkTyNfGZoyjEUKeSgsFg+1N b1A8mvDgrD938q6gUfTGnhAZHnZ3vzlLqeoSm6DqbASEcgClP8ucL81Qa q/XdWDqqeKYtySe7IH5jaQ0dlNqZZuVo6EqidV6W6DoEHPgJK5m4B2vYm stDxiNR4zGUSp52LwyEr+CW9Wb7fuMCTbQer2roa19yrxhlMUZbwVWSbf Xz6NyjanVaLDC+4wq76Xn2e/cKk+TQufMbXvFE7+c/MFfFkHMmstewKLF g==; X-IronPort-AV: E=McAfee;i="6500,9779,10562"; a="299112646" X-IronPort-AV: E=Sophos;i="5.96,248,1665471600"; d="scan'208";a="299112646" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2022 10:43:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10562"; a="756442196" X-IronPort-AV: E=Sophos;i="5.96,248,1665471600"; d="scan'208";a="756442196" Received: from spandruv-desk.jf.intel.com ([10.54.75.8]) by fmsmga002.fm.intel.com with ESMTP; 15 Dec 2022 10:43:28 -0800 From: Srinivas Pandruvada To: rafael@kernel.org, peterz@infradead.org, frederic@kernel.org, daniel.lezcano@linaro.org Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, len.brown@intel.com, Srinivas Pandruvada Subject: [RFC/RFT PATCH 2/2] sched/core: Add max duration to play_precise_idle() Date: Thu, 15 Dec 2022 10:43:00 -0800 Message-Id: <20221215184300.1592872-3-srinivas.pandruvada@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20221215184300.1592872-1-srinivas.pandruvada@linux.intel.com> References: <20221215184300.1592872-1-srinivas.pandruvada@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org When there is a storm of soft irqs, it is possible that caller spends lot more time in play_idle_precise() than the actual idle time requested. Add a pramater to play_precise_idle() to specify a maximum time to exit the loop in play_precise_idle(), even if the the total forced idle time is less than the desired. If exit before the required forced idle duration return error to the caller, otherwise return 0. For powercap/idle_inject, the maximum wait will be capped to two times of idle duration. Signed-off-by: Srinivas Pandruvada --- drivers/powercap/idle_inject.c | 4 +++- include/linux/cpu.h | 4 ++-- kernel/sched/idle.c | 13 +++++++++++-- 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/powercap/idle_inject.c b/drivers/powercap/idle_inject.c index fe86a09e3b67..2764155d184e 100644 --- a/drivers/powercap/idle_inject.c +++ b/drivers/powercap/idle_inject.c @@ -132,6 +132,7 @@ static void idle_inject_fn(unsigned int cpu) { struct idle_inject_device *ii_dev; struct idle_inject_thread *iit; + u64 idle_duration_ns; ii_dev = per_cpu(idle_inject_device, cpu); iit = per_cpu_ptr(&idle_inject_thread, cpu); @@ -140,8 +141,9 @@ static void idle_inject_fn(unsigned int cpu) * Let the smpboot main loop know that the task should not run again. */ iit->should_run = 0; + idle_duration_ns = READ_ONCE(ii_dev->idle_duration_us) * NSEC_PER_USEC; - play_idle_precise(READ_ONCE(ii_dev->idle_duration_us) * NSEC_PER_USEC, + play_idle_precise(idle_duration_ns, idle_duration_ns << 1, READ_ONCE(ii_dev->latency_us) * NSEC_PER_USEC); } diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 314802f98b9d..9969940553e5 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -190,11 +190,11 @@ void arch_cpu_idle_dead(void); int cpu_report_state(int cpu); int cpu_check_up_prepare(int cpu); void cpu_set_state_online(int cpu); -void play_idle_precise(u64 duration_ns, u64 latency_ns); +int play_idle_precise(u64 duration_ns, u64 max_duration_ns, u64 latency_ns); static inline void play_idle(unsigned long duration_us) { - play_idle_precise(duration_us * NSEC_PER_USEC, U64_MAX); + play_idle_precise(duration_us * NSEC_PER_USEC, 0, U64_MAX); } #ifdef CONFIG_HOTPLUG_CPU diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 77d6168288cf..c98b77ff84f5 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -360,10 +360,10 @@ static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *timer) return HRTIMER_NORESTART; } -void play_idle_precise(u64 duration_ns, u64 latency_ns) +int play_idle_precise(u64 duration_ns, u64 duration_ns_max, u64 latency_ns) { struct idle_timer it; - ktime_t remaining; + ktime_t remaining, end_time; /* * Only FIFO tasks can disable the tick since they don't need the forced @@ -378,6 +378,8 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns) remaining = ns_to_ktime(duration_ns); + end_time = ktime_add_ns(ktime_get(), duration_ns_max); + do { rcu_sleep_check(); preempt_disable(); @@ -402,12 +404,19 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns) preempt_fold_need_resched(); preempt_enable(); + if (!READ_ONCE(it.done) && duration_ns_max) { + if (ktime_after(ktime_get(), end_time)) + return -EAGAIN; + } + /* Give ksoftirqd 1 jiffy to get a chance to start its job */ if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd))) { __set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(1); } } while (!READ_ONCE(it.done)); + + return 0; } EXPORT_SYMBOL_GPL(play_idle_precise);