Message ID | 1315332049-2604-55-git-send-email-paulmck@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On Tue, 2011-09-06 at 11:00 -0700, Paul E. McKenney wrote: > PowerPC LPAR's __trace_hcall_exit() can invoke event tracing at a > point where RCU has been told that the CPU is in dyntick-idle mode. > Because event tracing uses RCU, this can result in failures. > > A correct fix would arrange for RCU to be told about dyntick-idle > mode after tracing had completed, however, this will require some care > because it appears that __trace_hcall_exit() can also be called from > non-dyntick-idle mode. This obviously needs to be fixed properly. hcall tracing is very useful and if I understand your patch properly, it just comments it out :-) I'm not sure what the best approach is, maybe have the hcall tracing test for the dyntick-idle mode and skip tracing in that case ? Cheers, Ben. > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Cc: anton@samba.org > Cc: benh@kernel.crashing.org > Cc: paulus@samba.org > --- > arch/powerpc/platforms/pseries/lpar.c | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c > index 39e6e0a..668f300 100644 > --- a/arch/powerpc/platforms/pseries/lpar.c > +++ b/arch/powerpc/platforms/pseries/lpar.c > @@ -715,12 +715,14 @@ EXPORT_SYMBOL(arch_free_page); > /* NB: reg/unreg are called while guarded with the tracepoints_mutex */ > extern long hcall_tracepoint_refcount; > > +#if 0 /* work around buggy use of RCU from dyntick-idle mode */ > /* > * Since the tracing code might execute hcalls we need to guard against > * recursion. One example of this are spinlocks calling H_YIELD on > * shared processor partitions. > */ > static DEFINE_PER_CPU(unsigned int, hcall_trace_depth); > +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */ > > void hcall_tracepoint_regfunc(void) > { > @@ -734,6 +736,7 @@ void hcall_tracepoint_unregfunc(void) > > void __trace_hcall_entry(unsigned long opcode, unsigned long *args) > { > +#if 0 /* work around buggy use of RCU from dyntick-idle mode */ > unsigned long flags; > unsigned int *depth; > > @@ -750,11 +753,13 @@ void __trace_hcall_entry(unsigned long opcode, unsigned long *args) > > out: > local_irq_restore(flags); > +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */ > } > > void __trace_hcall_exit(long opcode, unsigned long retval, > unsigned long *retbuf) > { > +#if 0 /* work around buggy use of RCU from dyntick-idle mode */ > unsigned long flags; > unsigned int *depth; > > @@ -771,6 +776,7 @@ void __trace_hcall_exit(long opcode, unsigned long retval, > > out: > local_irq_restore(flags); > +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */ > } > #endif >
On Wed, Sep 07, 2011 at 07:00:22AM -0300, Benjamin Herrenschmidt wrote: > On Tue, 2011-09-06 at 11:00 -0700, Paul E. McKenney wrote: > > PowerPC LPAR's __trace_hcall_exit() can invoke event tracing at a > > point where RCU has been told that the CPU is in dyntick-idle mode. > > Because event tracing uses RCU, this can result in failures. > > > > A correct fix would arrange for RCU to be told about dyntick-idle > > mode after tracing had completed, however, this will require some care > > because it appears that __trace_hcall_exit() can also be called from > > non-dyntick-idle mode. > > This obviously needs to be fixed properly. hcall tracing is very useful > and if I understand your patch properly, it just comments it out :-) That is exactly what it does, and I completely agree that this patch is nothing but a short-term work-around to allow my RCU tests to find other bugs. > I'm not sure what the best approach is, maybe have the hcall tracing > test for the dyntick-idle mode and skip tracing in that case ? Another approach would be to update Frederic Weisbecker's patch at: https://lkml.org/lkml/2011/8/20/83 so that powerpc does tick_nohz_enter_idle(false), and then uses rcu_enter_nohz() explicitly just after doing the hcall tracing. If pseries is the only powerpc architecture requiring this, then the argument to tick_nohz_enter_idle() could depend on the powerpc sub-architecture. The same thing would be needed for tick_nohz_exit_idle() and rcu_exit_nohz(): powerpc would need to invoke rcu_exit_nohz() after gaining control from the hypervisor but before doing its first tracing, and then it would need the idle loop to to tick_nohz_exit_idle(false). Again, if pseries is the only powerpc architecture requiring this, the argument to tick_nohz_exit_idle() could depend on the architecture. Would this approach work? Thanx, Paul > Cheers, > Ben. > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Cc: anton@samba.org > > Cc: benh@kernel.crashing.org > > Cc: paulus@samba.org > > --- > > arch/powerpc/platforms/pseries/lpar.c | 6 ++++++ > > 1 files changed, 6 insertions(+), 0 deletions(-) > > > > diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c > > index 39e6e0a..668f300 100644 > > --- a/arch/powerpc/platforms/pseries/lpar.c > > +++ b/arch/powerpc/platforms/pseries/lpar.c > > @@ -715,12 +715,14 @@ EXPORT_SYMBOL(arch_free_page); > > /* NB: reg/unreg are called while guarded with the tracepoints_mutex */ > > extern long hcall_tracepoint_refcount; > > > > +#if 0 /* work around buggy use of RCU from dyntick-idle mode */ > > /* > > * Since the tracing code might execute hcalls we need to guard against > > * recursion. One example of this are spinlocks calling H_YIELD on > > * shared processor partitions. > > */ > > static DEFINE_PER_CPU(unsigned int, hcall_trace_depth); > > +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */ > > > > void hcall_tracepoint_regfunc(void) > > { > > @@ -734,6 +736,7 @@ void hcall_tracepoint_unregfunc(void) > > > > void __trace_hcall_entry(unsigned long opcode, unsigned long *args) > > { > > +#if 0 /* work around buggy use of RCU from dyntick-idle mode */ > > unsigned long flags; > > unsigned int *depth; > > > > @@ -750,11 +753,13 @@ void __trace_hcall_entry(unsigned long opcode, unsigned long *args) > > > > out: > > local_irq_restore(flags); > > +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */ > > } > > > > void __trace_hcall_exit(long opcode, unsigned long retval, > > unsigned long *retbuf) > > { > > +#if 0 /* work around buggy use of RCU from dyntick-idle mode */ > > unsigned long flags; > > unsigned int *depth; > > > > @@ -771,6 +776,7 @@ void __trace_hcall_exit(long opcode, unsigned long retval, > > > > out: > > local_irq_restore(flags); > > +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */ > > } > > #endif > > > >
On Wed, Sep 07, 2011 at 06:44:00AM -0700, Paul E. McKenney wrote: > On Wed, Sep 07, 2011 at 07:00:22AM -0300, Benjamin Herrenschmidt wrote: > > On Tue, 2011-09-06 at 11:00 -0700, Paul E. McKenney wrote: > > > PowerPC LPAR's __trace_hcall_exit() can invoke event tracing at a > > > point where RCU has been told that the CPU is in dyntick-idle mode. > > > Because event tracing uses RCU, this can result in failures. > > > > > > A correct fix would arrange for RCU to be told about dyntick-idle > > > mode after tracing had completed, however, this will require some care > > > because it appears that __trace_hcall_exit() can also be called from > > > non-dyntick-idle mode. > > > > This obviously needs to be fixed properly. hcall tracing is very useful > > and if I understand your patch properly, it just comments it out :-) > > That is exactly what it does, and I completely agree that this patch > is nothing but a short-term work-around to allow my RCU tests to find > other bugs. > > > I'm not sure what the best approach is, maybe have the hcall tracing > > test for the dyntick-idle mode and skip tracing in that case ? > > Another approach would be to update Frederic Weisbecker's patch at: > > https://lkml.org/lkml/2011/8/20/83 > > so that powerpc does tick_nohz_enter_idle(false), and then uses > rcu_enter_nohz() explicitly just after doing the hcall tracing. > If pseries is the only powerpc architecture requiring this, then > the argument to tick_nohz_enter_idle() could depend on the powerpc > sub-architecture. I'm trying to fix this but I need a bit of help to understand the pseries cpu sleeping. In pseries_dedicated_idle_sleep(), what is the function that does the real sleeping? Is it cede_processor()? > > The same thing would be needed for tick_nohz_exit_idle() and > rcu_exit_nohz(): powerpc would need to invoke rcu_exit_nohz() after > gaining control from the hypervisor but before doing its first tracing, > and then it would need the idle loop to to tick_nohz_exit_idle(false). > Again, if pseries is the only powerpc architecture requiring this, > the argument to tick_nohz_exit_idle() could depend on the architecture. > > Would this approach work? Sounds like we really need that. Thanks.
On Tue, Sep 13, 2011 at 09:13:21PM +0200, Frederic Weisbecker wrote: > On Wed, Sep 07, 2011 at 06:44:00AM -0700, Paul E. McKenney wrote: > > On Wed, Sep 07, 2011 at 07:00:22AM -0300, Benjamin Herrenschmidt wrote: > > > On Tue, 2011-09-06 at 11:00 -0700, Paul E. McKenney wrote: > > > > PowerPC LPAR's __trace_hcall_exit() can invoke event tracing at a > > > > point where RCU has been told that the CPU is in dyntick-idle mode. > > > > Because event tracing uses RCU, this can result in failures. > > > > > > > > A correct fix would arrange for RCU to be told about dyntick-idle > > > > mode after tracing had completed, however, this will require some care > > > > because it appears that __trace_hcall_exit() can also be called from > > > > non-dyntick-idle mode. > > > > > > This obviously needs to be fixed properly. hcall tracing is very useful > > > and if I understand your patch properly, it just comments it out :-) > > > > That is exactly what it does, and I completely agree that this patch > > is nothing but a short-term work-around to allow my RCU tests to find > > other bugs. > > > > > I'm not sure what the best approach is, maybe have the hcall tracing > > > test for the dyntick-idle mode and skip tracing in that case ? > > > > Another approach would be to update Frederic Weisbecker's patch at: > > > > https://lkml.org/lkml/2011/8/20/83 > > > > so that powerpc does tick_nohz_enter_idle(false), and then uses > > rcu_enter_nohz() explicitly just after doing the hcall tracing. > > If pseries is the only powerpc architecture requiring this, then > > the argument to tick_nohz_enter_idle() could depend on the powerpc > > sub-architecture. > > I'm trying to fix this but I need a bit of help to understand the > pseries cpu sleeping. > > In pseries_dedicated_idle_sleep(), what is the function that does > the real sleeping? Is it cede_processor()? As I understand it, cede_processor()'s call to plpar_hcall_norets() results in the hypervisor being invoked, and could give up the CPU. And yes, in this case, RCU needs to stop paying attention to this CPU. And pseries_shared_idle_sleep() also invokes cede_proceessor(). Gah... And there also appear to be some assembly-language functions that can be invoked via the ppc_md.power_save() call from cpu_idle(): ppc6xx_idle(), power4_idle(), idle_spin(), idle_doze(), and book3e_idle(). There is also a power7_idle(), but it does not appear to be used anywhere. Plus there are the C-language ppc44x_idle(), beat_power_save(), cbe_power_save(), ps3_power_save(), and cpm_idle(). > > The same thing would be needed for tick_nohz_exit_idle() and > > rcu_exit_nohz(): powerpc would need to invoke rcu_exit_nohz() after > > gaining control from the hypervisor but before doing its first tracing, > > and then it would need the idle loop to to tick_nohz_exit_idle(false). > > Again, if pseries is the only powerpc architecture requiring this, > > the argument to tick_nohz_exit_idle() could depend on the architecture. > > > > Would this approach work? > > Sounds like we really need that. Sounds like an arch-dependent config symbol that is defined for the pseries targets, but not for the other powerpc architectures. Not clear to me what to do about power4_idle(), though. Thanx, Paul
> As I understand it, cede_processor()'s call to plpar_hcall_norets() > results in the hypervisor being invoked, and could give up the CPU. > And yes, in this case, RCU needs to stop paying attention to this CPU. > And pseries_shared_idle_sleep() also invokes cede_proceessor(). > > Gah... And there also appear to be some assembly-language functions > that can be invoked via the ppc_md.power_save() call from cpu_idle(): > ppc6xx_idle(), power4_idle(), idle_spin(), idle_doze(), and book3e_idle(). > There is also a power7_idle(), but it does not appear to be used anywhere. > > Plus there are the C-language ppc44x_idle(), beat_power_save(), > cbe_power_save(), ps3_power_save(), and cpm_idle(). > > > > The same thing would be needed for tick_nohz_exit_idle() and > > > rcu_exit_nohz(): powerpc would need to invoke rcu_exit_nohz() after > > > gaining control from the hypervisor but before doing its first tracing, > > > and then it would need the idle loop to to tick_nohz_exit_idle(false). > > > Again, if pseries is the only powerpc architecture requiring this, > > > the argument to tick_nohz_exit_idle() could depend on the architecture. > > > > > > Would this approach work? > > > > Sounds like we really need that. > > Sounds like an arch-dependent config symbol that is defined for the > pseries targets, but not for the other powerpc architectures. > > Not clear to me what to do about power4_idle(), though. I don't totally follow, too many things to deal with right now, but keep in mind that we build multiplatform kernels, so you can have powermac, cell, pseries, etc... all in one kernel binary (including power7 idle). Shouldn't we instead change the plpar trace call to skip the tracing when not safe to do so ? Cheers, Ben.
On Tue, Sep 13, 2011 at 05:49:53PM -0300, Benjamin Herrenschmidt wrote: > > > As I understand it, cede_processor()'s call to plpar_hcall_norets() > > results in the hypervisor being invoked, and could give up the CPU. > > And yes, in this case, RCU needs to stop paying attention to this CPU. > > And pseries_shared_idle_sleep() also invokes cede_proceessor(). > > > > Gah... And there also appear to be some assembly-language functions > > that can be invoked via the ppc_md.power_save() call from cpu_idle(): > > ppc6xx_idle(), power4_idle(), idle_spin(), idle_doze(), and book3e_idle(). > > There is also a power7_idle(), but it does not appear to be used anywhere. > > > > Plus there are the C-language ppc44x_idle(), beat_power_save(), > > cbe_power_save(), ps3_power_save(), and cpm_idle(). > > > > > > The same thing would be needed for tick_nohz_exit_idle() and > > > > rcu_exit_nohz(): powerpc would need to invoke rcu_exit_nohz() after > > > > gaining control from the hypervisor but before doing its first tracing, > > > > and then it would need the idle loop to to tick_nohz_exit_idle(false). > > > > Again, if pseries is the only powerpc architecture requiring this, > > > > the argument to tick_nohz_exit_idle() could depend on the architecture. > > > > > > > > Would this approach work? > > > > > > Sounds like we really need that. > > > > Sounds like an arch-dependent config symbol that is defined for the > > pseries targets, but not for the other powerpc architectures. > > > > Not clear to me what to do about power4_idle(), though. > > I don't totally follow, too many things to deal with right now, but keep > in mind that we build multiplatform kernels, so you can have powermac, > cell, pseries, etc... all in one kernel binary (including power7 idle). > > Shouldn't we instead change the plpar trace call to skip the tracing > when not safe to do so ? Or may be we can have a plpar_hcall_norets_notrace() for this specific case? Lemme try something.
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index 39e6e0a..668f300 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -715,12 +715,14 @@ EXPORT_SYMBOL(arch_free_page); /* NB: reg/unreg are called while guarded with the tracepoints_mutex */ extern long hcall_tracepoint_refcount; +#if 0 /* work around buggy use of RCU from dyntick-idle mode */ /* * Since the tracing code might execute hcalls we need to guard against * recursion. One example of this are spinlocks calling H_YIELD on * shared processor partitions. */ static DEFINE_PER_CPU(unsigned int, hcall_trace_depth); +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */ void hcall_tracepoint_regfunc(void) { @@ -734,6 +736,7 @@ void hcall_tracepoint_unregfunc(void) void __trace_hcall_entry(unsigned long opcode, unsigned long *args) { +#if 0 /* work around buggy use of RCU from dyntick-idle mode */ unsigned long flags; unsigned int *depth; @@ -750,11 +753,13 @@ void __trace_hcall_entry(unsigned long opcode, unsigned long *args) out: local_irq_restore(flags); +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */ } void __trace_hcall_exit(long opcode, unsigned long retval, unsigned long *retbuf) { +#if 0 /* work around buggy use of RCU from dyntick-idle mode */ unsigned long flags; unsigned int *depth; @@ -771,6 +776,7 @@ void __trace_hcall_exit(long opcode, unsigned long retval, out: local_irq_restore(flags); +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */ } #endif
PowerPC LPAR's __trace_hcall_exit() can invoke event tracing at a point where RCU has been told that the CPU is in dyntick-idle mode. Because event tracing uses RCU, this can result in failures. A correct fix would arrange for RCU to be told about dyntick-idle mode after tracing had completed, however, this will require some care because it appears that __trace_hcall_exit() can also be called from non-dyntick-idle mode. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: anton@samba.org Cc: benh@kernel.crashing.org Cc: paulus@samba.org --- arch/powerpc/platforms/pseries/lpar.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-)