Message ID | 1599199797-25978-1-git-send-email-gkohli@codeaurora.org |
---|---|
State | Superseded |
Headers | show |
Series | trace: Fix race in trace_open and buffer resize call | expand |
Hi Steven, Please let us know, if below change looks good. Or let us know some other way to solve this. Thanks, Gaurav On 9/4/2020 11:39 AM, Gaurav Kohli wrote: > Below race can come, if trace_open and resize of > cpu buffer is running parallely on different cpus > CPUX CPUY > ring_buffer_resize > atomic_read(&buffer->resize_disabled) > tracing_open > tracing_reset_online_cpus > ring_buffer_reset_cpu > rb_reset_cpu > rb_update_pages > remove/insert pages > resetting pointer > This race can cause data abort or some times infinte loop in > rb_remove_pages and rb_insert_pages while checking pages > for sanity. > Take ring buffer lock in trace_open to avoid resetting of cpu buffer. > > Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org> > > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h > index 136ea09..55f9115 100644 > --- a/include/linux/ring_buffer.h > +++ b/include/linux/ring_buffer.h > @@ -163,6 +163,8 @@ bool ring_buffer_empty_cpu(struct trace_buffer *buffer, int cpu); > > void ring_buffer_record_disable(struct trace_buffer *buffer); > void ring_buffer_record_enable(struct trace_buffer *buffer); > +void ring_buffer_mutex_acquire(struct trace_buffer *buffer); > +void ring_buffer_mutex_release(struct trace_buffer *buffer); > void ring_buffer_record_off(struct trace_buffer *buffer); > void ring_buffer_record_on(struct trace_buffer *buffer); > bool ring_buffer_record_is_on(struct trace_buffer *buffer); > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index 93ef0ab..638ec8f 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -3632,6 +3632,25 @@ void ring_buffer_record_enable(struct trace_buffer *buffer) > EXPORT_SYMBOL_GPL(ring_buffer_record_enable); > > /** > + * ring_buffer_mutex_acquire - prevent resetting of buffer > + * during resize > + */ > +void ring_buffer_mutex_acquire(struct trace_buffer *buffer) > +{ > + mutex_lock(&buffer->mutex); > +} > +EXPORT_SYMBOL_GPL(ring_buffer_mutex_acquire); > + > +/** > + * ring_buffer_mutex_release - prevent resetting of buffer > + * during resize > + */ > +void ring_buffer_mutex_release(struct trace_buffer *buffer) > +{ > + mutex_unlock(&buffer->mutex); > +} > +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release); > +/** > * ring_buffer_record_off - stop all writes into the buffer > * @buffer: The ring buffer to stop writes to. > * > @@ -4918,6 +4937,8 @@ void ring_buffer_reset(struct trace_buffer *buffer) > struct ring_buffer_per_cpu *cpu_buffer; > int cpu; > > + /* prevent another thread from changing buffer sizes */ > + mutex_lock(&buffer->mutex); > for_each_buffer_cpu(buffer, cpu) { > cpu_buffer = buffer->buffers[cpu]; > > @@ -4936,6 +4957,7 @@ void ring_buffer_reset(struct trace_buffer *buffer) > atomic_dec(&cpu_buffer->record_disabled); > atomic_dec(&cpu_buffer->resize_disabled); > } > + mutex_unlock(&buffer->mutex); > } > EXPORT_SYMBOL_GPL(ring_buffer_reset); > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index f40d850..392e9aa 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -2006,6 +2006,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf) > if (!buffer) > return; > > + ring_buffer_mutex_acquire(buffer); > + > ring_buffer_record_disable(buffer); > > /* Make sure all commits have finished */ > @@ -2016,6 +2018,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf) > ring_buffer_reset_online_cpus(buffer); > > ring_buffer_record_enable(buffer); > + > + ring_buffer_mutex_release(buffer); > } > > /* Must have trace_types_lock held */ >
On Mon, 14 Sep 2020 10:00:50 +0530 Gaurav Kohli <gkohli@codeaurora.org> wrote: > Hi Steven, > > Please let us know, if below change looks good. > Or let us know some other way to solve this. > > Thanks, > Gaurav > > Hmm, for some reason, I don't see this in my INBOX, but it shows up in my LKML folder. :-/ > > On 9/4/2020 11:39 AM, Gaurav Kohli wrote: > > Below race can come, if trace_open and resize of > > cpu buffer is running parallely on different cpus > > CPUX CPUY > > ring_buffer_resize > > atomic_read(&buffer->resize_disabled) > > tracing_open > > tracing_reset_online_cpus > > ring_buffer_reset_cpu > > rb_reset_cpu > > rb_update_pages > > remove/insert pages > > resetting pointer > > This race can cause data abort or some times infinte loop in > > rb_remove_pages and rb_insert_pages while checking pages > > for sanity. > > Take ring buffer lock in trace_open to avoid resetting of cpu buffer. > > > > Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org> > > > > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h > > index 136ea09..55f9115 100644 > > --- a/include/linux/ring_buffer.h > > +++ b/include/linux/ring_buffer.h > > @@ -163,6 +163,8 @@ bool ring_buffer_empty_cpu(struct trace_buffer *buffer, int cpu); > > > > void ring_buffer_record_disable(struct trace_buffer *buffer); > > void ring_buffer_record_enable(struct trace_buffer *buffer); > > +void ring_buffer_mutex_acquire(struct trace_buffer *buffer); > > +void ring_buffer_mutex_release(struct trace_buffer *buffer); > > void ring_buffer_record_off(struct trace_buffer *buffer); > > void ring_buffer_record_on(struct trace_buffer *buffer); > > bool ring_buffer_record_is_on(struct trace_buffer *buffer); > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > > index 93ef0ab..638ec8f 100644 > > --- a/kernel/trace/ring_buffer.c > > +++ b/kernel/trace/ring_buffer.c > > @@ -3632,6 +3632,25 @@ void ring_buffer_record_enable(struct trace_buffer *buffer) > > EXPORT_SYMBOL_GPL(ring_buffer_record_enable); > > > > /** > > + * ring_buffer_mutex_acquire - prevent resetting of buffer > > + * during resize > > + */ > > +void ring_buffer_mutex_acquire(struct trace_buffer *buffer) > > +{ > > + mutex_lock(&buffer->mutex); > > +} > > +EXPORT_SYMBOL_GPL(ring_buffer_mutex_acquire); > > + > > +/** > > + * ring_buffer_mutex_release - prevent resetting of buffer > > + * during resize > > + */ > > +void ring_buffer_mutex_release(struct trace_buffer *buffer) > > +{ > > + mutex_unlock(&buffer->mutex); > > +} > > +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release); I really do not like to export these. > > +/** > > * ring_buffer_record_off - stop all writes into the buffer > > * @buffer: The ring buffer to stop writes to. > > * > > @@ -4918,6 +4937,8 @@ void ring_buffer_reset(struct trace_buffer *buffer) > > struct ring_buffer_per_cpu *cpu_buffer; > > int cpu; > > > > + /* prevent another thread from changing buffer sizes */ > > + mutex_lock(&buffer->mutex); > > for_each_buffer_cpu(buffer, cpu) { > > cpu_buffer = buffer->buffers[cpu]; > > > > @@ -4936,6 +4957,7 @@ void ring_buffer_reset(struct trace_buffer *buffer) > > atomic_dec(&cpu_buffer->record_disabled); > > atomic_dec(&cpu_buffer->resize_disabled); > > } > > + mutex_unlock(&buffer->mutex); > > } > > EXPORT_SYMBOL_GPL(ring_buffer_reset); > > > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > > index f40d850..392e9aa 100644 > > --- a/kernel/trace/trace.c > > +++ b/kernel/trace/trace.c > > @@ -2006,6 +2006,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf) > > if (!buffer) > > return; > > > > + ring_buffer_mutex_acquire(buffer); > > + > > ring_buffer_record_disable(buffer); Hmm, why do we disable here as it gets disabled again in the call to ring_buffer_reset_online_cpus()? Perhaps we don't need to disable the buffer here. The only difference is that we have: buf->time_start = buffer_ftrace_now(buf, buf->cpu); And that the above disables the entire buffer, whereas the reset only resets individual ones. But I don't think that will make any difference. -- Steve > > > > /* Make sure all commits have finished */ > > @@ -2016,6 +2018,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf) > > ring_buffer_reset_online_cpus(buffer); > > > > ring_buffer_record_enable(buffer); > > + > > + ring_buffer_mutex_release(buffer); > > } > > > > /* Must have trace_types_lock held */ > > >
Hi Steven, thanks for reply. On 9/14/2020 9:49 PM, Steven Rostedt wrote: > On Mon, 14 Sep 2020 10:00:50 +0530 > Gaurav Kohli <gkohli@codeaurora.org> wrote: > >> Hi Steven, >> >> Please let us know, if below change looks good. >> Or let us know some other way to solve this. >> >> Thanks, >> Gaurav >> >> > > Hmm, for some reason, I don't see this in my INBOX, but it shows up in my > LKML folder. :-/ > > >>> +void ring_buffer_mutex_release(struct trace_buffer *buffer) >>> +{ >>> + mutex_unlock(&buffer->mutex); >>> +} >>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release); > > I really do not like to export these. > Actually available reader lock is not helping here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to resolve this(this came on 4.19/5.4), in latest tip it is trace buffer lock. Due to this i have exported api. >>> +/** >>> * ring_buffer_record_off - stop all writes into the buffer >>> * @buffer: The ring buffer to stop writes to. >>> * >>> @@ -4918,6 +4937,8 @@ void ring_buffer_reset(struct trace_buffer *buffer) >>> struct ring_buffer_per_cpu *cpu_buffer; >>> int cpu; >>> + /* prevent another thread from changing buffer sizes */ >>> + mutex_lock(&buffer->mutex); >>> for_each_buffer_cpu(buffer, cpu) { >>> cpu_buffer = buffer->buffers[cpu]; >>> @@ -4936,6 +4957,7 @@ void ring_buffer_reset(struct trace_buffer *buffer) >>> atomic_dec(&cpu_buffer->record_disabled); >>> atomic_dec(&cpu_buffer->resize_disabled); >>> } >>> + mutex_unlock(&buffer->mutex); >>> } >>> EXPORT_SYMBOL_GPL(ring_buffer_reset); >>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c >>> index f40d850..392e9aa 100644 >>> --- a/kernel/trace/trace.c >>> +++ b/kernel/trace/trace.c >>> @@ -2006,6 +2006,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf) >>> if (!buffer) >>> return; >>> + ring_buffer_mutex_acquire(buffer); >>> + >>> ring_buffer_record_disable(buffer); > > Hmm, why do we disable here as it gets disabled again in the call to > ring_buffer_reset_online_cpus()? Perhaps we don't need to disable the You mean cpu_buffer->reader_lock in reset_disabled_cpu_buffer? Actually reader lock is already there but this is not helping if tracing_open and ring_buffer_resize are running parallel on different cpus. We are seeing below race mainly during removal of extra pages: ring_buffer_resize //Below portion of code //not under any lock nr_pages_to_update < 0 init_list_head(new_pages) rb_update_pages ring_buffer_resize tracing_open tracing_reset_online_cpus ring_buffer_reset_cpu cpu_buffer_reset done //now lock started warning(nr_removed) We are seeing cases like cpu buffer got reset due to tracing open in other call, and then seeing issue in rb_remove_pages. Similar case can come during rb_insert_pages as well: rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer) { struct list_head *pages = &cpu_buffer->new_pages; int retries, success; //before lock cpu buffer may get reset in another cpu, due to which we are seeing infinite loop cases as new_pages pointer got reset in rb_reset_cpu. raw_spin_lock_irq(&cpu_buffer->reader_lock); > buffer here. The only difference is that we have: > > buf->time_start = buffer_ftrace_now(buf, buf->cpu); > > And that the above disables the entire buffer, whereas the reset only > resets individual ones. > > But I don't think that will make any difference. > > -- Steve >
On Tue, 15 Sep 2020 10:38:03 +0530 Gaurav Kohli <gkohli@codeaurora.org> wrote: > > >>> +void ring_buffer_mutex_release(struct trace_buffer *buffer) > >>> +{ > >>> + mutex_unlock(&buffer->mutex); > >>> +} > >>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release); > > > > I really do not like to export these. > > > > Actually available reader lock is not helping > here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to > resolve this(this came on 4.19/5.4), in latest tip it is trace buffer > lock. Due to this i have exported api. I'm saying, why don't you take the buffer->mutex in the ring_buffer_reset_online_cpus() function? And remove all the protection in tracing_reset_online_cpus()? void tracing_reset_online_cpus(struct array_buffer *buf) { struct trace_buffer *buffer = buf->buffer; if (!buffer) return; buf->time_start = buffer_ftrace_now(buf, buf->cpu); ring_buffer_reset_online_cpus(buffer); } The reset_online_cpus() is already doing the synchronization, we don't need to do it twice. I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU") made the synchronization in tracing_reset_online_cpus() obsolete. -- Steve
On 9/15/2020 6:53 PM, Steven Rostedt wrote: > On Tue, 15 Sep 2020 10:38:03 +0530 > Gaurav Kohli <gkohli@codeaurora.org> wrote: > >> >> >>> +void ring_buffer_mutex_release(struct trace_buffer *buffer) >> >>> +{ >> >>> + mutex_unlock(&buffer->mutex); >> >>> +} >> >>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release); >> > >> > I really do not like to export these. >> > >> >> Actually available reader lock is not helping >> here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to >> resolve this(this came on 4.19/5.4), in latest tip it is trace buffer >> lock. Due to this i have exported api. > > I'm saying, why don't you take the buffer->mutex in the > ring_buffer_reset_online_cpus() function? And remove all the protection in > tracing_reset_online_cpus()? Yes, got your point. then we can avoid export. Actually we are seeing issue in older kernel like 4.19/4.14/5.4 and there below patch was not present in stable branches: ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by > avoiding synchronize_rcu for each CPU") Actually i have also thought to take mutex lock in ring_buffer_reset_cpu while doing individual cpu reset, but this could cause another problem: Different cpu buffer may have different state, so i have taken lock in tracing_reset_online_cpus. > > void tracing_reset_online_cpus(struct array_buffer *buf) > { > struct trace_buffer *buffer = buf->buffer; > > if (!buffer) > return; > > buf->time_start = buffer_ftrace_now(buf, buf->cpu); > > ring_buffer_reset_online_cpus(buffer); > } > > The reset_online_cpus() is already doing the synchronization, we don't need > to do it twice. > > I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by > avoiding synchronize_rcu for each CPU") made the synchronization in > tracing_reset_online_cpus() obsolete. > > -- Steve > Yes, with above patch no need to take lock in tracing_reset_online_cpus.
Sorry for spam, saw some failure so sending mail again. On 9/15/2020 6:53 PM, Steven Rostedt wrote: > On Tue, 15 Sep 2020 10:38:03 +0530 > Gaurav Kohli <gkohli@codeaurora.org> wrote: > >> >> >>> +void ring_buffer_mutex_release(struct trace_buffer *buffer) >> >>> +{ >> >>> + mutex_unlock(&buffer->mutex); >> >>> +} >> >>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release); >> > >> > I really do not like to export these. >> > >> >> Actually available reader lock is not helping >> here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to >> resolve this(this came on 4.19/5.4), in latest tip it is trace buffer >> lock. Due to this i have exported api. > > I'm saying, why don't you take the buffer->mutex in the > ring_buffer_reset_online_cpus() function? And remove all the protection in > tracing_reset_online_cpus()? Yes, got your point. then we can avoid export. Actually we are seeing issue in older kernel like 4.19/4.14/5.4 and there below patch is not present in stable branches: ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by > avoiding synchronize_rcu for each CPU") Actually i have also thought to take mutex lock in ring_buffer_reset_cpu while doing individual cpu reset, but this could cause another problem: Different cpu buffer may have different state, so i have taken lock in tracing_reset_online_cpus. > > void tracing_reset_online_cpus(struct array_buffer *buf) > { > struct trace_buffer *buffer = buf->buffer; > > if (!buffer) > return; > > buf->time_start = buffer_ftrace_now(buf, buf->cpu); > > ring_buffer_reset_online_cpus(buffer); > } > > The reset_online_cpus() is already doing the synchronization, we don't need > to do it twice. > > I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by > avoiding synchronize_rcu for each CPU") made the synchronization in > tracing_reset_online_cpus() obsolete. > > -- Steve > Yes, with above patch no need to take lock in tracing_reset_online_cpus.
On Tue, 15 Sep 2020 22:53:32 +0530 Gaurav Kohli <gkohli@codeaurora.org> wrote: > On 9/15/2020 6:53 PM, Steven Rostedt wrote: > > On Tue, 15 Sep 2020 10:38:03 +0530 > > Gaurav Kohli <gkohli@codeaurora.org> wrote: > > > >> > >> >>> +void ring_buffer_mutex_release(struct trace_buffer *buffer) > >> >>> +{ > >> >>> + mutex_unlock(&buffer->mutex); > >> >>> +} > >> >>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release); > >> > > >> > I really do not like to export these. > >> > > >> > >> Actually available reader lock is not helping > >> here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to > >> resolve this(this came on 4.19/5.4), in latest tip it is trace buffer > >> lock. Due to this i have exported api. > > > > I'm saying, why don't you take the buffer->mutex in the > > ring_buffer_reset_online_cpus() function? And remove all the protection in > > tracing_reset_online_cpus()? > > Yes, got your point. then we can avoid export. Actually we are seeing > issue in older kernel like 4.19/4.14/5.4 and there below patch was not > present in stable branches: > > ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by > > avoiding synchronize_rcu for each CPU") If you mark this patch for stable, you can add: Depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU") > > Actually i have also thought to take mutex lock in ring_buffer_reset_cpu > while doing individual cpu reset, but this could cause another problem: Hmm, I think we should also take the buffer lock in the reset_cpu() call too, and modify tracing_reset_cpu() the same way. > > Different cpu buffer may have different state, so i have taken lock in > tracing_reset_online_cpus. Why would different states be an issue in synchronizing? -- Steve > > > > void tracing_reset_online_cpus(struct array_buffer *buf) > > { > > struct trace_buffer *buffer = buf->buffer; > > > > if (!buffer) > > return; > > > > buf->time_start = buffer_ftrace_now(buf, buf->cpu); > > > > ring_buffer_reset_online_cpus(buffer); > > } > > > > The reset_online_cpus() is already doing the synchronization, we don't need > > to do it twice. > > > > I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by > > avoiding synchronize_rcu for each CPU") made the synchronization in > > tracing_reset_online_cpus() obsolete. > > > > -- Steve > > > > Yes, with above patch no need to take lock in tracing_reset_online_cpus.
On 9/15/2020 11:43 PM, Steven Rostedt wrote: >>>> Actually available reader lock is not helping >>>> here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to >>>> resolve this(this came on 4.19/5.4), in latest tip it is trace buffer >>>> lock. Due to this i have exported api. >>> >>> I'm saying, why don't you take the buffer->mutex in the >>> ring_buffer_reset_online_cpus() function? And remove all the protection in >>> tracing_reset_online_cpus()? >> >> Yes, got your point. then we can avoid export. Actually we are seeing >> issue in older kernel like 4.19/4.14/5.4 and there below patch was not >> present in stable branches: >> >> ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by >> > avoiding synchronize_rcu for each CPU") > > If you mark this patch for stable, you can add: > > Depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU") > Thanks Steven, Yes this needs to be back ported. I have tried this in 5.4 but this need more patches like 13292494379f92f532de71b31a54018336adc589 tracing: Make struct ring_buffer less ambiguous Instead of protecting all reset, can we do it individually like below: +++ b/kernel/trace/ring_buffer.c @@ -4838,7 +4838,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer) static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer) { unsigned long flags; + struct trace_buffer *buffer = cpu_buffer->buffer; + mutex_lock(&buffer->mutex); raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing))) @@ -4852,6 +4854,7 @@ static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer) out: raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); + mutex_unlock(&buffer->mutex); } Please let me know, if above looks good, we will do testing with this. And this we can directly use in older kernel as well in ring_buffer_reset_cpu. >> >> Actually i have also thought to take mutex lock in ring_buffer_reset_cpu >> while doing individual cpu reset, but this could cause another problem: > > Hmm, I think we should also take the buffer lock in the reset_cpu() call > too, and modify tracing_reset_cpu() the same way. > if we take above patch, then this is not required. Please let me know for the approach. >> >> Different cpu buffer may have different state, so i have taken lock in >> tracing_reset_online_cpus. > > Why would different states be an issue in synchronizing? > > -- Steve > Yes, this should not be problem. >>> >>> void tracing_reset_online_cpus(struct array_buffer *buf) >>> { >>> struct trace_buffer *buffer = buf->buffer; >>> >>> if (!buffer) >>> return; >>> >>> buf->time_start = buffer_ftrace_now(buf, buf->cpu); >>> >>> ring_buffer_reset_online_cpus(buffer); >>> } >>> >>> The reset_online_cpus() is already doing the synchronization, we don't need >>> to do it twice. >>> >>> I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by >>> avoiding synchronize_rcu for each CPU") made the synchronization in >>> tracing_reset_online_cpus() obsolete. >>> >>> -- Steve >>> >> >> Yes, with above patch no need to take lock in tracing_reset_online_cpus. >
On 9/16/2020 12:02 PM, Gaurav Kohli wrote: >>> >>> Yes, got your point. then we can avoid export. Actually we are seeing >>> issue in older kernel like 4.19/4.14/5.4 and there below patch was not >>> present in stable branches: >>> >>> ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by >>> > avoiding synchronize_rcu for each CPU") >> >> If you mark this patch for stable, you can add: >> >> Depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by >> avoiding synchronize_rcu for each CPU") >> > > Thanks Steven, Yes this needs to be back ported. I have tried this in > 5.4 but this need more patches like > 13292494379f92f532de71b31a54018336adc589 > tracing: Make struct ring_buffer less ambiguous > > Instead of protecting all reset, can we do it individually like below: > > > +++ b/kernel/trace/ring_buffer.c > @@ -4838,7 +4838,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer) > static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu > *cpu_buffer) > { > unsigned long flags; > + struct trace_buffer *buffer = cpu_buffer->buffer; > > + mutex_lock(&buffer->mutex); > raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); > > if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing))) > @@ -4852,6 +4854,7 @@ static void reset_disabled_cpu_buffer(struct > ring_buffer_per_cpu *cpu_buffer) > > out: > raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); > + mutex_unlock(&buffer->mutex); > } > Hi Steven, Not seeing issue with above patch in 5.4, Please let me know if above approach looks good to you, will raise patch for same. Otherwise we will raise patch for older approach by marking depends on of below patch: depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by Thanks, Gaurav > Please let me know, if above looks good, we will do testing with this. > And this we can directly use in older kernel as well in > ring_buffer_reset_cpu. > >>> >>> Actually i have also thought to take mutex lock in ring_buffer_reset_cpu >>> while doing individual cpu reset, but this could cause another problem: >> >> Hmm, I think we should also take the buffer lock in the reset_cpu() call >> too, and modify tracing_reset_cpu() the same way. >> > > if we take above patch, then this is not required. > Please let me know for the approach. >>> >>> Different cpu buffer may have different state, so i have taken lock in >>> tracing_reset_online_cpus. >> >> Why would different states be an issue in synchronizing? >> >> -- Steve >> > > Yes, this should not be problem.
Sorry for not replying sooner, my email is still rather full from my 10 day vacation, and I'm still in "skimming" mode at looking at it. On Wed, 16 Sep 2020 12:02:46 +0530 Gaurav Kohli <gkohli@codeaurora.org> wrote: > >> Yes, got your point. then we can avoid export. Actually we are seeing > >> issue in older kernel like 4.19/4.14/5.4 and there below patch was not > >> present in stable branches: > >> > >> ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by > >> > avoiding synchronize_rcu for each CPU") > > > > If you mark this patch for stable, you can add: > > > > Depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU") > > > > Thanks Steven, Yes this needs to be back ported. I have tried this in > 5.4 but this need more patches like > 13292494379f92f532de71b31a54018336adc589 > tracing: Make struct ring_buffer less ambiguous No, that is not needed. That's just a trivial renaming of structures. Use the old structure. Dependency is if the algorithm depends on the change. Not cosmetic. > > Instead of protecting all reset, can we do it individually like below: > > > +++ b/kernel/trace/ring_buffer.c > @@ -4838,7 +4838,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer) > static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu > *cpu_buffer) > { > unsigned long flags; > + struct trace_buffer *buffer = cpu_buffer->buffer; > > + mutex_lock(&buffer->mutex); > raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); > > if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing))) > @@ -4852,6 +4854,7 @@ static void reset_disabled_cpu_buffer(struct > ring_buffer_per_cpu *cpu_buffer) > > out: > raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); > + mutex_unlock(&buffer->mutex); > } > > Please let me know, if above looks good, we will do testing with this. > And this we can directly use in older kernel as well in > ring_buffer_reset_cpu. No that will not work. You need the lock around the disabling of the buffers and the synchronizing with RCU. -- Steve
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 136ea09..55f9115 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -163,6 +163,8 @@ bool ring_buffer_empty_cpu(struct trace_buffer *buffer, int cpu); void ring_buffer_record_disable(struct trace_buffer *buffer); void ring_buffer_record_enable(struct trace_buffer *buffer); +void ring_buffer_mutex_acquire(struct trace_buffer *buffer); +void ring_buffer_mutex_release(struct trace_buffer *buffer); void ring_buffer_record_off(struct trace_buffer *buffer); void ring_buffer_record_on(struct trace_buffer *buffer); bool ring_buffer_record_is_on(struct trace_buffer *buffer); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 93ef0ab..638ec8f 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3632,6 +3632,25 @@ void ring_buffer_record_enable(struct trace_buffer *buffer) EXPORT_SYMBOL_GPL(ring_buffer_record_enable); /** + * ring_buffer_mutex_acquire - prevent resetting of buffer + * during resize + */ +void ring_buffer_mutex_acquire(struct trace_buffer *buffer) +{ + mutex_lock(&buffer->mutex); +} +EXPORT_SYMBOL_GPL(ring_buffer_mutex_acquire); + +/** + * ring_buffer_mutex_release - prevent resetting of buffer + * during resize + */ +void ring_buffer_mutex_release(struct trace_buffer *buffer) +{ + mutex_unlock(&buffer->mutex); +} +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release); +/** * ring_buffer_record_off - stop all writes into the buffer * @buffer: The ring buffer to stop writes to. * @@ -4918,6 +4937,8 @@ void ring_buffer_reset(struct trace_buffer *buffer) struct ring_buffer_per_cpu *cpu_buffer; int cpu; + /* prevent another thread from changing buffer sizes */ + mutex_lock(&buffer->mutex); for_each_buffer_cpu(buffer, cpu) { cpu_buffer = buffer->buffers[cpu]; @@ -4936,6 +4957,7 @@ void ring_buffer_reset(struct trace_buffer *buffer) atomic_dec(&cpu_buffer->record_disabled); atomic_dec(&cpu_buffer->resize_disabled); } + mutex_unlock(&buffer->mutex); } EXPORT_SYMBOL_GPL(ring_buffer_reset); diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index f40d850..392e9aa 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2006,6 +2006,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf) if (!buffer) return; + ring_buffer_mutex_acquire(buffer); + ring_buffer_record_disable(buffer); /* Make sure all commits have finished */ @@ -2016,6 +2018,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf) ring_buffer_reset_online_cpus(buffer); ring_buffer_record_enable(buffer); + + ring_buffer_mutex_release(buffer); } /* Must have trace_types_lock held */
Below race can come, if trace_open and resize of cpu buffer is running parallely on different cpus CPUX CPUY ring_buffer_resize atomic_read(&buffer->resize_disabled) tracing_open tracing_reset_online_cpus ring_buffer_reset_cpu rb_reset_cpu rb_update_pages remove/insert pages resetting pointer This race can cause data abort or some times infinte loop in rb_remove_pages and rb_insert_pages while checking pages for sanity. Take ring buffer lock in trace_open to avoid resetting of cpu buffer. Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org>