Message ID | 20221219202910.3063036-3-elliott@hpe.com |
---|---|
State | New |
Headers | show |
Series | crypto/rcu: suppress unnecessary CPU stall warnings | expand |
On Mon, Dec 19, 2022 at 02:29:09PM -0600, Robert Elliott wrote: > Include the first CPU number in the first pr_err() call reporting > an expedited stall warning. > > Printing the CPU numbers with subsequent pr_cont() calls can > result in the prints being many lines away or being dropped entirely > in a busy system. This change ensures there is indication of at > least one of the CPUs with the problem with the original message. > > Before (if prints are interspersed with other prints): > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { > 13-.... > } 32 jiffies s: 6685 root: 0x1/. > > After: > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {13-.... > } 32 jiffies s: 6685 root: 0x1/. > > Signed-off-by: Robert Elliott <elliott@hpe.com> A couple of questions below. > --- > kernel/rcu/tree_exp.h | 22 +++++++++++++++------- > 1 file changed, 15 insertions(+), 7 deletions(-) > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index ed6c3cce28f2..ade6a18e6c07 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -624,8 +624,6 @@ static void synchronize_rcu_expedited_wait(void) > if (rcu_stall_is_suppressed()) > continue; > trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall")); > - pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {", > - rcu_state.name); > ndetected = 0; > rcu_for_each_leaf_node(rnp) { > ndetected += rcu_print_task_exp_stall(rnp); > @@ -637,11 +635,21 @@ static void synchronize_rcu_expedited_wait(void) > continue; > ndetected++; > rdp = per_cpu_ptr(&rcu_data, cpu); > - pr_cont(" %d-%c%c%c%c", cpu, > - "O."[!!cpu_online(cpu)], > - "o."[!!(rdp->grpmask & rnp->expmaskinit)], > - "N."[!!(rdp->grpmask & rnp->expmaskinitnext)], > - "D."[!!(rdp->cpu_no_qs.b.exp)]); > + // print the prefix and the first CPU number together > + // under heavy load, the pr_cont prints can be far away or dropped > + if (ndetected == 1) Is the purpose here to print the header only on the first detected task? If so, what if there is more than one task blocking the first rcu_node structure? Wouldn't that omit the header entirely? > + pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {%d-%c%c%c%c", We can of course get other console output interspersed at this point. This might be OK in practice, but the commit log should clearly spell out the reasons. > + rcu_state.name, cpu, > + "O."[!!cpu_online(cpu)], > + "o."[!!(rdp->grpmask & rnp->expmaskinit)], > + "N."[!!(rdp->grpmask & rnp->expmaskinitnext)], > + "D."[!!(rdp->cpu_no_qs.b.exp)]); > + else > + pr_cont(" %d-%c%c%c%c", cpu, > + "O."[!!cpu_online(cpu)], > + "o."[!!(rdp->grpmask & rnp->expmaskinit)], > + "N."[!!(rdp->grpmask & rnp->expmaskinitnext)], > + "D."[!!(rdp->cpu_no_qs.b.exp)]); > } > } > pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n", > -- > 2.38.1 >
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index ed6c3cce28f2..ade6a18e6c07 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -624,8 +624,6 @@ static void synchronize_rcu_expedited_wait(void) if (rcu_stall_is_suppressed()) continue; trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall")); - pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {", - rcu_state.name); ndetected = 0; rcu_for_each_leaf_node(rnp) { ndetected += rcu_print_task_exp_stall(rnp); @@ -637,11 +635,21 @@ static void synchronize_rcu_expedited_wait(void) continue; ndetected++; rdp = per_cpu_ptr(&rcu_data, cpu); - pr_cont(" %d-%c%c%c%c", cpu, - "O."[!!cpu_online(cpu)], - "o."[!!(rdp->grpmask & rnp->expmaskinit)], - "N."[!!(rdp->grpmask & rnp->expmaskinitnext)], - "D."[!!(rdp->cpu_no_qs.b.exp)]); + // print the prefix and the first CPU number together + // under heavy load, the pr_cont prints can be far away or dropped + if (ndetected == 1) + pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {%d-%c%c%c%c", + rcu_state.name, cpu, + "O."[!!cpu_online(cpu)], + "o."[!!(rdp->grpmask & rnp->expmaskinit)], + "N."[!!(rdp->grpmask & rnp->expmaskinitnext)], + "D."[!!(rdp->cpu_no_qs.b.exp)]); + else + pr_cont(" %d-%c%c%c%c", cpu, + "O."[!!cpu_online(cpu)], + "o."[!!(rdp->grpmask & rnp->expmaskinit)], + "N."[!!(rdp->grpmask & rnp->expmaskinitnext)], + "D."[!!(rdp->cpu_no_qs.b.exp)]); } } pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
Include the first CPU number in the first pr_err() call reporting an expedited stall warning. Printing the CPU numbers with subsequent pr_cont() calls can result in the prints being many lines away or being dropped entirely in a busy system. This change ensures there is indication of at least one of the CPUs with the problem with the original message. Before (if prints are interspersed with other prints): rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 13-.... } 32 jiffies s: 6685 root: 0x1/. After: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {13-.... } 32 jiffies s: 6685 root: 0x1/. Signed-off-by: Robert Elliott <elliott@hpe.com> --- kernel/rcu/tree_exp.h | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-)