Message ID | 20221215184300.1592872-1-srinivas.pandruvada@linux.intel.com |
---|---|
State | New |
Headers | show |
On Thu, Dec 15, 2022 at 10:42:59AM -0800, Srinivas Pandruvada wrote: > + /* Give ksoftirqd 1 jiffy to get a chance to start its job */ > + if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd))) { > + __set_current_state(TASK_UNINTERRUPTIBLE); > + schedule_timeout(1); > + } That's absolutely disgusting :-/
On Fri, 2022-12-16 at 12:19 +0100, Peter Zijlstra wrote: > On Thu, Dec 15, 2022 at 10:42:59AM -0800, Srinivas Pandruvada wrote: > > + /* Give ksoftirqd 1 jiffy to get a chance to start > > its job */ > > + if (!READ_ONCE(it.done) && > > task_is_running(__this_cpu_read(ksoftirqd))) { > > + __set_current_state(TASK_UNINTERRUPTIBLE); > > + schedule_timeout(1); > > + } > > That's absolutely disgusting :-/ What is the alternative? Process softirq in this task context for one time? Thanks, Srinivas
On Fri, Dec 16, 2022 at 12:19:34PM +0100, Peter Zijlstra wrote: > On Thu, Dec 15, 2022 at 10:42:59AM -0800, Srinivas Pandruvada wrote: > > + /* Give ksoftirqd 1 jiffy to get a chance to start its job */ > > + if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd))) { > > + __set_current_state(TASK_UNINTERRUPTIBLE); > > + schedule_timeout(1); > > + } > > That's absolutely disgusting :-/ I know, and I hate checking task_is_running(__this_cpu_read(ksoftirqd)) everywhere in idle. And in fact it doesn't work because some cpuidle drivers also do need_resched() checks. I guess that either we assume that the idle injection is more important than serving softirqs and we shutdown the warnings accordingly, or we arrange for idle injection to have a lower prio than ksoftirqd. Thoughts?
On 2022-12-19 12:33:22 [+0100], Peter Zijlstra wrote: > ksoftirq is typically a CFS task while idle injection is required to be > a FIFO (typically FIFO-1) task -- so that would require lifting > ksoftirqd to FIFO and that has other problems. > > I'm guessing the problem case is where idle injection comes in while > ksoftirq is running (and preempted), because in that case you cannot run > the softirq stuff in-place. > > The 'right' thing to do would be to PI boost ksoftirqd in that case I > suppose. Perhaps something like so, it would boost ksoftirq when it's > running, and otherwise runs the ksoftirqd thing in-situ. > > I've not really throught hard about this though, so perhaps I completely > wrecked things. I don't know why you intend to run ksoftirqd but in general it breaks RT left and right and we attempt to avoid running ksoftirqd as much as possible. If it runs and you go and boost it then it probably gets even worse from RT point of view. ksoftirqd runs softirqs from hardirq context. Everything else is handled in is handled within local-bh-disable+enable loop. We already have have the boost-ksoftird hammer which is the per-CPU BLK called local_bh_disable(). In general everything should be moved out of it. For timers we have the ktimerd thread which needs clean up. Sebastian
On Mon, Dec 19, 2022 at 12:46:54PM +0100, Sebastian Andrzej Siewior wrote: > On 2022-12-19 12:33:22 [+0100], Peter Zijlstra wrote: > > ksoftirq is typically a CFS task while idle injection is required to be > > a FIFO (typically FIFO-1) task -- so that would require lifting > > ksoftirqd to FIFO and that has other problems. > > > > I'm guessing the problem case is where idle injection comes in while > > ksoftirq is running (and preempted), because in that case you cannot run > > the softirq stuff in-place. > > > > The 'right' thing to do would be to PI boost ksoftirqd in that case I > > suppose. Perhaps something like so, it would boost ksoftirq when it's > > running, and otherwise runs the ksoftirqd thing in-situ. > > > > I've not really throught hard about this though, so perhaps I completely > > wrecked things. > > I don't know why you intend to run ksoftirqd but in general it breaks RT So the upstream problem is where softirq is punted to ksoftirq (obvious from hardirq tail) and idle-injection comes in and either: - runs before ksoftirq had a chance to run, or - preempts ksoftirqd. In both cases we'll appear to go idle with softirqs pending -- which triggers a pesky warning, because obviously undesirable. In the first case we can run 'ksoftirqd' in-context, but then need to serialize against the actual ksoftirqd. In the second case we need to serialize against ksoftirqd and ensure it is complete before proceeding with going 'idle'. Since idle-injection runs FIFO and ksoftirqd typically does not, some form of PI is required. > left and right and we attempt to avoid running ksoftirqd as much as > possible. If it runs and you go and boost it then it probably gets even > worse from RT point of view. So if you never run ksoftirqd, the above problem doesn't happen. If ksoftirqd *can* run, you still need to deal with it somehow. Boosting ksoftirqd to whatever priority the idle-injection thread has should not be worse than anything the idle-injection already does, no? Specifically, without idle-injection involved the patch as proposed (+- fixes required to make it compile and work) should be a no-op. > ksoftirqd runs softirqs from hardirq context. Everything else is handled > in is handled within local-bh-disable+enable loop. We already have have > the boost-ksoftird hammer which is the per-CPU BLK called > local_bh_disable(). In general everything should be moved out of it. > For timers we have the ktimerd thread which needs clean up.
On Mon, Dec 19, 2022 at 02:54:55PM -0800, srinivas pandruvada wrote: > But after ksoftirqd_run_end(), which will renable local irq, this may > further cause more soft irq pending. Here RCU soft irq entry continues Right you are.. what about if we spell the idle.c thing like so instead? Then we repeat the softirq thing every time we drop out of the idle loop for a reschedule. diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index f26ab2675f7d..6dce49813bcc 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -381,8 +381,13 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns) hrtimer_start(&it.timer, ns_to_ktime(duration_ns), HRTIMER_MODE_REL_PINNED_HARD); - while (!READ_ONCE(it.done)) + while (!READ_ONCE(it.done)) { + rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu)); + __run_ksoftirqd(smp_processor_id()); + rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu)); + do_idle(); + } cpuidle_use_deepest_state(0); current->flags &= ~PF_IDLE;
On Tue, Dec 20, 2022 at 09:51:09PM +0100, Peter Zijlstra wrote: > On Mon, Dec 19, 2022 at 02:54:55PM -0800, srinivas pandruvada wrote: > > > But after ksoftirqd_run_end(), which will renable local irq, this may > > further cause more soft irq pending. Here RCU soft irq entry continues > > Right you are.. what about if we spell the idle.c thing like so instead? > > Then we repeat the softirq thing every time we drop out of the idle loop > for a reschedule. Uff, that obviously can't work because we already have preemption disabled here, this needs a bit more work. I think it's possible to re-arrange things a bit. I'll try and have a look tomorrow, but the kids have their xmas play at school so who knows what I'll get done. > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index f26ab2675f7d..6dce49813bcc 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -381,8 +381,13 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns) > hrtimer_start(&it.timer, ns_to_ktime(duration_ns), > HRTIMER_MODE_REL_PINNED_HARD); > > - while (!READ_ONCE(it.done)) > + while (!READ_ONCE(it.done)) { > + rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu)); > + __run_ksoftirqd(smp_processor_id()); > + rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu)); > + > do_idle(); > + } > > cpuidle_use_deepest_state(0); > current->flags &= ~PF_IDLE;
On Tue, 2022-12-20 at 22:18 +0100, Peter Zijlstra wrote: > On Tue, Dec 20, 2022 at 09:51:09PM +0100, Peter Zijlstra wrote: > > On Mon, Dec 19, 2022 at 02:54:55PM -0800, srinivas pandruvada > > wrote: > > > > > But after ksoftirqd_run_end(), which will renable local irq, this > > > may > > > further cause more soft irq pending. Here RCU soft irq entry > > > continues > > > > Right you are.. what about if we spell the idle.c thing like so > > instead? > > > > Then we repeat the softirq thing every time we drop out of the idle > > loop > > for a reschedule. > > Uff, that obviously can't work because we already have preemption > disabled here, this needs a bit more work. I think it's possible to > re-arrange things a bit. Didn't work. Also when __do_softirq returns, softirq can be pending again. I think if local_softirq_pending(), we can break do_idle() loop. Thanks, Srinivas > > I'll try and have a look tomorrow, but the kids have their xmas play > at > school so who knows what I'll get done. > > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > > index f26ab2675f7d..6dce49813bcc 100644 > > --- a/kernel/sched/idle.c > > +++ b/kernel/sched/idle.c > > @@ -381,8 +381,13 @@ void play_idle_precise(u64 duration_ns, u64 > > latency_ns) > > hrtimer_start(&it.timer, ns_to_ktime(duration_ns), > > HRTIMER_MODE_REL_PINNED_HARD); > > > > - while (!READ_ONCE(it.done)) > > + while (!READ_ONCE(it.done)) { > > + rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu)); > > + __run_ksoftirqd(smp_processor_id()); > > + rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu)); > > + > > do_idle(); > > + } > > > > cpuidle_use_deepest_state(0); > > current->flags &= ~PF_IDLE;
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index d17b0a5ce6ac..882c48441469 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -52,6 +52,12 @@ static int __init cpu_idle_nopoll_setup(char *__unused) __setup("hlt", cpu_idle_nopoll_setup); #endif +/* FIXME: handle __cpuidle / instrumentation_begin()/end() */ +static bool idle_loop_should_break(void) +{ + return need_resched() || task_is_running(__this_cpu_read(ksoftirqd)); +} + static noinline int __cpuidle cpu_idle_poll(void) { trace_cpu_idle(0, smp_processor_id()); @@ -59,7 +65,7 @@ static noinline int __cpuidle cpu_idle_poll(void) rcu_idle_enter(); local_irq_enable(); - while (!tif_need_resched() && + while (!idle_loop_should_break() && (cpu_idle_force_poll || tick_check_broadcast_expired())) cpu_relax(); @@ -177,7 +183,7 @@ static void cpuidle_idle_call(void) * Check if the idle task must be rescheduled. If it is the * case, exit the function after re-enabling the local irq. */ - if (need_resched()) { + if (idle_loop_should_break()) { local_irq_enable(); return; } @@ -279,7 +285,7 @@ static void do_idle(void) __current_set_polling(); tick_nohz_idle_enter(); - while (!need_resched()) { + while (!idle_loop_should_break()) { rmb(); local_irq_disable(); @@ -373,25 +379,31 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns) WARN_ON_ONCE(!duration_ns); WARN_ON_ONCE(current->mm); - rcu_sleep_check(); - preempt_disable(); - current->flags |= PF_IDLE; - cpuidle_use_deepest_state(latency_ns); + do { + rcu_sleep_check(); + preempt_disable(); + current->flags |= PF_IDLE; + cpuidle_use_deepest_state(latency_ns); - it.done = 0; - hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); - it.timer.function = idle_inject_timer_fn; - hrtimer_start(&it.timer, ns_to_ktime(duration_ns), - HRTIMER_MODE_REL_PINNED_HARD); + it.done = 0; + hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); + it.timer.function = idle_inject_timer_fn; + hrtimer_start(&it.timer, ns_to_ktime(duration_ns), + HRTIMER_MODE_REL_PINNED_HARD); - while (!READ_ONCE(it.done)) - do_idle(); + while (!READ_ONCE(it.done) && !task_is_running(__this_cpu_read(ksoftirqd))) + do_idle(); + + cpuidle_use_deepest_state(0); + current->flags &= ~PF_IDLE; - cpuidle_use_deepest_state(0); - current->flags &= ~PF_IDLE; + preempt_fold_need_resched(); + preempt_enable(); - preempt_fold_need_resched(); - preempt_enable(); + /* Give ksoftirqd 1 jiffy to get a chance to start its job */ + if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd))) + schedule_timeout(1); + } while (!READ_ONCE(it.done)); } EXPORT_SYMBOL_GPL(play_idle_precise);