Message ID | 1442423929-12253-1-git-send-email-mark.rutland@arm.com |
---|---|
State | Superseded |
Headers | show |
Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu: > This has been observed to result in an exit-time hang when counting > rare/unschedulable events with perf record, and can be triggered > artificially with the script below: > > ---- > #!/bin/sh > printf "REPRO: launching perf\n"; > ./perf record -e software/config=9/ sleep 1 & > PERF_PID=$!; > sleep 0.002; > kill -2 $PERF_PID; > printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID"; > wait $PERF_PID; > printf "REPRO: perf exited\n"; > ---- So, I run it here, without this patch, and get: [root@zoo ~]# time ./repro.sh REPRO: launching perf REPRO: waiting for perf (766) to exit... [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.015 MB perf.data ] REPRO: perf exited real 0m1.060s user 0m0.018s sys 0m0.037s [root@zoo ~]# perf report --stdio Error: The perf.data file has no samples! # To display the perf.data header info, please use # --header/--header-only options. # [root@zoo ~]# perf evlist -v software/config=9/: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 [root@zoo ~]# What am I doing wrong? Trying to reproduce this before even looking at the patch :-) - Arnaldo > To avoid this, have __perf_session__process_events bail out early when > the file has no data (i.e. it has no events). > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Adrian Hunter <adrian.hunter@intel.com> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Jiri Olsa <jolsa@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > --- > tools/perf/util/session.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c > index 8a4537e..fc3f7c9 100644 > --- a/tools/perf/util/session.c > +++ b/tools/perf/util/session.c > @@ -1580,7 +1580,10 @@ static int __perf_session__process_events(struct perf_session *session, > file_offset = page_offset; > head = data_offset - page_offset; > > - if (data_size && (data_offset + data_size < file_size)) > + if (data_size == 0) > + goto out; > + > + if (data_offset + data_size < file_size) > file_size = data_offset + data_size; > > ui_progress__init(&prog, file_size, "Processing events..."); > -- > 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hi, On Wed, Sep 16, 2015 at 09:54:54PM +0100, Arnaldo Carvalho de Melo wrote: > Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu: > > This has been observed to result in an exit-time hang when counting > > rare/unschedulable events with perf record, and can be triggered > > artificially with the script below: > > > > ---- > > #!/bin/sh > > printf "REPRO: launching perf\n"; > > ./perf record -e software/config=9/ sleep 1 & > > PERF_PID=$!; > > sleep 0.002; > > kill -2 $PERF_PID; > > printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID"; > > wait $PERF_PID; > > printf "REPRO: perf exited\n"; > > ---- > > So, I run it here, without this patch, and get: > > [root@zoo ~]# time ./repro.sh > REPRO: launching perf > REPRO: waiting for perf (766) to exit... > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.015 MB perf.data ] > REPRO: perf exited > real 0m1.060s > user 0m0.018s > sys 0m0.037s [...] > What am I doing wrong? Trying to reproduce this before even looking at > the patch :-) I suspect you have a shinier computer than I do! ;) It's easier to trigger on slower machines -- decreasing the sleep time to 0.001 or below may help, though if it's set too low we exit early without exercising the failing path. It looks like to trigger the bug we need to successfully call record__mmap_read_all at least once (but not read any data), then break out of the loop because hits == rec->samples && done. That way we call process_buildids with no data having been written. I added some additional logging of rec->bytes_written immediately before and after the main loop. In all cases where we've written something we exit, and in all cases we didn't, we hang. I've left some examples (from an arm64 system) below. Thanks, Mark. mark@ribbensteg:~$ ./repro.sh REPRO: launching perf WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check /proc/sys/kernel/kptr_restrict. Samples in kernel functions may not be resolved if a suitable vmlinux file is not found in the buildid cache or in the vmlinux path. Samples in kernel modules won't be resolved at all. If some relocation was applied (e.g. kexec) symbols may be misresolved even with a suitable vmlinux or kallsyms file. Waiting for perf (2104) to exit... Cannot read kernel map Couldn't record kernel reference relocation symbol Symbol resolution may be skewed if relocation was used (e.g. kexec). Check /proc/kallsyms permission or run as root. Pre-loop rec wrote 0 Post-loop rec wrote 2128 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data ] Perf exited mark@ribbensteg:~$ ./repro.sh REPRO: launching perf WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check /proc/sys/kernel/kptr_restrict. Samples in kernel functions may not be resolved if a suitable vmlinux file is not found in the buildid cache or in the vmlinux path. Samples in kernel modules won't be resolved at all. If some relocation was applied (e.g. kexec) symbols may be misresolved even with a suitable vmlinux or kallsyms file. Waiting for perf (2099) to exit... Cannot read kernel map Couldn't record kernel reference relocation symbol Symbol resolution may be skewed if relocation was used (e.g. kexec). Check /proc/kallsyms permission or run as root. Pre-loop rec wrote 0 Post-loop rec wrote 0 [ perf record: Woken up 0 times to write data ] < hang here > mark@ribbensteg:~$ ./repro.sh REPRO: launching perf WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check /proc/sys/kernel/kptr_restrict. Samples in kernel functions may not be resolved if a suitable vmlinux file is not found in the buildid cache or in the vmlinux path. Samples in kernel modules won't be resolved at all. If some relocation was applied (e.g. kexec) symbols may be misresolved even with a suitable vmlinux or kallsyms file. Cannot read kernel map Couldn't record kernel reference relocation symbol Symbol resolution may be skewed if relocation was used (e.g. kexec). Check /proc/kallsyms permission or run as root. Pre-loop rec wrote 0 Waiting for perf (2108) to exit... Post-loop rec wrote 0 [ perf record: Woken up 1 times to write data ] < hang here > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 17/09/15 18:41, Mark Rutland wrote: > Hi, > > On Wed, Sep 16, 2015 at 09:54:54PM +0100, Arnaldo Carvalho de Melo wrote: >> Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu: >>> This has been observed to result in an exit-time hang when counting >>> rare/unschedulable events with perf record, and can be triggered >>> artificially with the script below: >>> >>> ---- >>> #!/bin/sh >>> printf "REPRO: launching perf\n"; >>> ./perf record -e software/config=9/ sleep 1 & >>> PERF_PID=$!; >>> sleep 0.002; >>> kill -2 $PERF_PID; >>> printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID"; >>> wait $PERF_PID; >>> printf "REPRO: perf exited\n"; >>> ---- >> >> So, I run it here, without this patch, and get: >> >> [root@zoo ~]# time ./repro.sh >> REPRO: launching perf >> REPRO: waiting for perf (766) to exit... >> [ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.015 MB perf.data ] >> REPRO: perf exited >> real 0m1.060s >> user 0m0.018s >> sys 0m0.037s > > [...] > >> What am I doing wrong? Trying to reproduce this before even looking at >> the patch :-) > > I suspect you have a shinier computer than I do! ;) > I imagine you would also need to contrive not to write any synthesized MMAP or COMM events i.e. no access to /proc perhaps? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Fri, Sep 18, 2015 at 07:09:16AM +0100, Adrian Hunter wrote: > On 17/09/15 18:41, Mark Rutland wrote: > > Hi, > > > > On Wed, Sep 16, 2015 at 09:54:54PM +0100, Arnaldo Carvalho de Melo wrote: > >> Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu: > >>> This has been observed to result in an exit-time hang when counting > >>> rare/unschedulable events with perf record, and can be triggered > >>> artificially with the script below: > >>> > >>> ---- > >>> #!/bin/sh > >>> printf "REPRO: launching perf\n"; > >>> ./perf record -e software/config=9/ sleep 1 & > >>> PERF_PID=$!; > >>> sleep 0.002; > >>> kill -2 $PERF_PID; > >>> printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID"; > >>> wait $PERF_PID; > >>> printf "REPRO: perf exited\n"; > >>> ---- > >> > >> So, I run it here, without this patch, and get: > >> > >> [root@zoo ~]# time ./repro.sh > >> REPRO: launching perf > >> REPRO: waiting for perf (766) to exit... > >> [ perf record: Woken up 1 times to write data ] > >> [ perf record: Captured and wrote 0.015 MB perf.data ] > >> REPRO: perf exited > >> real 0m1.060s > >> user 0m0.018s > >> sys 0m0.037s > > > > [...] > > > >> What am I doing wrong? Trying to reproduce this before even looking at > >> the patch :-) > > > > I suspect you have a shinier computer than I do! ;) > > > > I imagine you would also need to contrive not to write any synthesized MMAP > or COMM events i.e. no access to /proc perhaps? Taking a look through __cmd_record: I'm not writing to a pipe, so nothing gets synthesized for attrs or tracepoints. I'm not using aux trace, so nothing gets synthesized for that. Running as a regular user I don't have synthesized events for the kernel mmap or modules: WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check /proc/sys/kernel/kptr_restrict. Samples in kernel functions may not be resolved if a suitable vmlinux file is not found in the buildid cache or in the vmlinux path. Samples in kernel modules won't be resolved at all. If some relocation was applied (e.g. kexec) symbols may be misresolved even with a suitable vmlinux or kallsyms file. Cannot read kernel map Couldn't record kernel reference relocation symbol Symbol resolution may be skewed if relocation was used (e.g. kexec). Check /proc/kallsyms permission or run as root. I'm not running a guest, so I don't have any synthesized events for the guest os. __machine__synthesize_threads doesn't synthesize any task events (target__has_task(target) is false at this point, as is target__has_cpu(target)), because perf starts the workload later via perf_evlist__start_workload. So it looks like I shouldn't have any synthesized events. Have I missed anything? Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 18/09/15 12:51, Mark Rutland wrote: > On Fri, Sep 18, 2015 at 07:09:16AM +0100, Adrian Hunter wrote: >> On 17/09/15 18:41, Mark Rutland wrote: >>> Hi, >>> >>> On Wed, Sep 16, 2015 at 09:54:54PM +0100, Arnaldo Carvalho de Melo wrote: >>>> Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu: >>>>> This has been observed to result in an exit-time hang when counting >>>>> rare/unschedulable events with perf record, and can be triggered >>>>> artificially with the script below: >>>>> >>>>> ---- >>>>> #!/bin/sh >>>>> printf "REPRO: launching perf\n"; >>>>> ./perf record -e software/config=9/ sleep 1 & >>>>> PERF_PID=$!; >>>>> sleep 0.002; >>>>> kill -2 $PERF_PID; >>>>> printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID"; >>>>> wait $PERF_PID; >>>>> printf "REPRO: perf exited\n"; >>>>> ---- >>>> >>>> So, I run it here, without this patch, and get: >>>> >>>> [root@zoo ~]# time ./repro.sh >>>> REPRO: launching perf >>>> REPRO: waiting for perf (766) to exit... >>>> [ perf record: Woken up 1 times to write data ] >>>> [ perf record: Captured and wrote 0.015 MB perf.data ] >>>> REPRO: perf exited >>>> real 0m1.060s >>>> user 0m0.018s >>>> sys 0m0.037s >>> >>> [...] >>> >>>> What am I doing wrong? Trying to reproduce this before even looking at >>>> the patch :-) >>> >>> I suspect you have a shinier computer than I do! ;) >>> >> >> I imagine you would also need to contrive not to write any synthesized MMAP >> or COMM events i.e. no access to /proc perhaps? > > Taking a look through __cmd_record: > > I'm not writing to a pipe, so nothing gets synthesized for attrs or tracepoints. > > I'm not using aux trace, so nothing gets synthesized for that. > > Running as a regular user I don't have synthesized events for the kernel mmap > or modules: > > WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, > check /proc/sys/kernel/kptr_restrict. > > Samples in kernel functions may not be resolved if a suitable vmlinux > file is not found in the buildid cache or in the vmlinux path. > > Samples in kernel modules won't be resolved at all. > > If some relocation was applied (e.g. kexec) symbols may be misresolved > even with a suitable vmlinux or kallsyms file. > > Cannot read kernel map > Couldn't record kernel reference relocation symbol > Symbol resolution may be skewed if relocation was used (e.g. kexec). > Check /proc/kallsyms permission or run as root. > > I'm not running a guest, so I don't have any synthesized events for the guest > os. > > __machine__synthesize_threads doesn't synthesize any task events > (target__has_task(target) is false at this point, as is > target__has_cpu(target)), because perf starts the workload later via > perf_evlist__start_workload. > > So it looks like I shouldn't have any synthesized events. Have I missed > anything? Yes, you are right. But you are not getting the COMM and MMAP events from the exec which means you are killing perf before it execs the workload. Perf writes through a pipe to its forked child to do the exec, so to reproduce it you just need to put everything on the same cpu and play around with the sleep number taskset -c 0 tools/perf/perf record -- sleep 1 & sleep 0.005 ; kill -2 $! reproduces the problem for me. Of course, since the workload doesn't get exec'ed it doesn't matter if it is bogus i.e. taskset -c 0 tools/perf/perf record -- sdfgsdgdg & sleep 0.005 ; kill -2 $! also reproduces the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
> > So it looks like I shouldn't have any synthesized events. Have I missed > > anything? > > Yes, you are right. But you are not getting the COMM and MMAP events from > the exec which means you are killing perf before it execs the workload. Oh, I see. > Perf writes through a pipe to its forked child to do the exec, so > to reproduce it you just need to put everything on the same cpu and play around > with the sleep number > > taskset -c 0 tools/perf/perf record -- sleep 1 & sleep 0.005 ; kill -2 $! > > reproduces the problem for me. Of course, since the workload doesn't get exec'ed > it doesn't matter if it is bogus i.e. > > taskset -c 0 tools/perf/perf record -- sdfgsdgdg & sleep 0.005 ; kill -2 $! > > also reproduces the problem. Thanks for confirming! Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Fri, Sep 18, 2015 at 04:00:18PM +0100, Arnaldo Carvalho de Melo wrote: > I tried what you suggested, but couldn't reproduce it, I only managed > when I applied the patch below _and_ executed; > > # echo 1 > /proc/sys/kernel/kptr_restrict > > To be on the same page as Mark. Then, yes, I get stuck in: > > #0 0x00007f746a1d8aba in mmap64 () at ../sysdeps/unix/syscall-template.S:81 > #1 0x00000000004be66e in __perf_session__process_events (file_size=232, data_size=<optimized out>, data_offset=<optimized out>, session=0x1ba8590) > at util/session.c:1604 > #2 perf_session__process_events (session=0x1ba8590) at util/session.c:1685 > #3 0x000000000042d97b in process_buildids (rec=0x869ac0 <record>) at builtin-record.c:364 > #4 __cmd_record (rec=0x869ac0 <record>, argv=<optimized out>, argc=<optimized out>) at builtin-record.c:734 > #5 cmd_record (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>) at builtin-record.c:1199 > #6 0x0000000000479123 in run_builtin (p=p@entry=0x874a90 <commands+144>, argc=argc@entry=3, argv=argv@entry=0x7ffd92eb5ee0) at perf.c:370 > #7 0x0000000000420aba in handle_internal_command (argv=0x7ffd92eb5ee0, argc=3) at perf.c:429 > #8 run_argv (argv=0x7ffd92eb5c70, argcp=0x7ffd92eb5c7c) at perf.c:473 > #9 main (argc=3, argv=0x7ffd92eb5ee0) at perf.c:588 > (gdb) > > After applying your patch it works, had to tweak the commit log to avoid > starting lines with ---, breaks git scripts, also added a commiter log, > check it, patch is below, after the one I used to not process any > samples. Sorry for the '---' problem, I'll bear that in mind in future. Your commit log looks fine, though I'm slightly confused by the Reported-by line -- did you mean to add that? Mark. > commit dd486ec4aa33cfca2fd912ef501d49909005de79 > Author: Mark Rutland <mark.rutland@arm.com> > Date: Wed Sep 16 18:18:49 2015 +0100 > > perf record: Avoid infinite loop at buildid processing with no samples > > If a session contains no events, we can get stuck in an infinite loop in > __perf_session__process_events, with a non-zero file_size and data_offset, but > a zero data_size. > > In this case, we can mmap the entirety of the file (consisting of the file and > attribute headers), and fetch_mmaped_event will correctly refuse to read any > (unmapped and non-existent) event headers. This causes > __perf_session__process_events to unmap the file and retry with the exact same > parameters, getting stuck in an infinite loop. > > This has been observed to result in an exit-time hang when counting > rare/unschedulable events with perf record, and can be triggered artificially > with the script below: > > ---- > #!/bin/sh > printf "REPRO: launching perf\n"; > ./perf record -e software/config=9/ sleep 1 & > PERF_PID=$!; > sleep 0.002; > kill -2 $PERF_PID; > printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID"; > wait $PERF_PID; > printf "REPRO: perf exited\n"; > ---- > > To avoid this, have __perf_session__process_events bail out early when > the file has no data (i.e. it has no events). > > Commiter note: > > I only managed to reproduce this when setting > /proc/sys/kernel/kptr_restrict to '1' and changing the code to > purposefully not process any samples and no synthesized samples, i.e. > kptr_restrict prevents 'record' from synthesizing the kernel mmaps for > vmlinux + modules and since it is a workload started from perf, we don't > synthesize mmap/comm records for existing threads. > > Adrian Hunter managed to reproduce it in his environment tho. > > Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Adrian Hunter <adrian.hunter@intel.com> > Cc: Jiri Olsa <jolsa@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Link: http://lkml.kernel.org/r/1442423929-12253-1-git-send-email-mark.rutland@arm.com > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c > index 8a4537ee9bc3..fc3f7c922f99 100644 > --- a/tools/perf/util/session.c > +++ b/tools/perf/util/session.c > @@ -1580,7 +1580,10 @@ static int __perf_session__process_events(struct perf_session *session, > file_offset = page_offset; > head = data_offset - page_offset; > > - if (data_size && (data_offset + data_size < file_size)) > + if (data_size == 0) > + goto out; > + > + if (data_offset + data_size < file_size) > file_size = data_offset + data_size; > > ui_progress__init(&prog, file_size, "Processing events..."); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Em Fri, Sep 18, 2015 at 04:18:51PM +0100, Mark Rutland escreveu: > On Fri, Sep 18, 2015 at 04:00:18PM +0100, Arnaldo Carvalho de Melo wrote: > > After applying your patch it works, had to tweak the commit log to avoid > > starting lines with ---, breaks git scripts, also added a commiter log, > > check it, patch is below, after the one I used to not process any > > samples. > Sorry for the '---' problem, I'll bear that in mind in future. > Your commit log looks fine, though I'm slightly confused by the > Reported-by line -- did you mean to add that? Sorry, I mean Tested-by:, will replace, guess i can replace the one for Adrian from Cc: to Tested-by too, right? - Arnaldo > Mark. > > > commit dd486ec4aa33cfca2fd912ef501d49909005de79 > > Author: Mark Rutland <mark.rutland@arm.com> > > Date: Wed Sep 16 18:18:49 2015 +0100 > > > > perf record: Avoid infinite loop at buildid processing with no samples > > > > If a session contains no events, we can get stuck in an infinite loop in > > __perf_session__process_events, with a non-zero file_size and data_offset, but > > a zero data_size. > > > > In this case, we can mmap the entirety of the file (consisting of the file and > > attribute headers), and fetch_mmaped_event will correctly refuse to read any > > (unmapped and non-existent) event headers. This causes > > __perf_session__process_events to unmap the file and retry with the exact same > > parameters, getting stuck in an infinite loop. > > > > This has been observed to result in an exit-time hang when counting > > rare/unschedulable events with perf record, and can be triggered artificially > > with the script below: > > > > ---- > > #!/bin/sh > > printf "REPRO: launching perf\n"; > > ./perf record -e software/config=9/ sleep 1 & > > PERF_PID=$!; > > sleep 0.002; > > kill -2 $PERF_PID; > > printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID"; > > wait $PERF_PID; > > printf "REPRO: perf exited\n"; > > ---- > > > > To avoid this, have __perf_session__process_events bail out early when > > the file has no data (i.e. it has no events). > > > > Commiter note: > > > > I only managed to reproduce this when setting > > /proc/sys/kernel/kptr_restrict to '1' and changing the code to > > purposefully not process any samples and no synthesized samples, i.e. > > kptr_restrict prevents 'record' from synthesizing the kernel mmaps for > > vmlinux + modules and since it is a workload started from perf, we don't > > synthesize mmap/comm records for existing threads. > > > > Adrian Hunter managed to reproduce it in his environment tho. > > > > Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > Cc: Adrian Hunter <adrian.hunter@intel.com> > > Cc: Jiri Olsa <jolsa@redhat.com> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Link: http://lkml.kernel.org/r/1442423929-12253-1-git-send-email-mark.rutland@arm.com > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > > > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c > > index 8a4537ee9bc3..fc3f7c922f99 100644 > > --- a/tools/perf/util/session.c > > +++ b/tools/perf/util/session.c > > @@ -1580,7 +1580,10 @@ static int __perf_session__process_events(struct perf_session *session, > > file_offset = page_offset; > > head = data_offset - page_offset; > > > > - if (data_size && (data_offset + data_size < file_size)) > > + if (data_size == 0) > > + goto out; > > + > > + if (data_offset + data_size < file_size) > > file_size = data_offset + data_size; > > > > ui_progress__init(&prog, file_size, "Processing events..."); > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 18/09/15 18:29, Arnaldo Carvalho de Melo wrote: > Em Fri, Sep 18, 2015 at 04:18:51PM +0100, Mark Rutland escreveu: >> On Fri, Sep 18, 2015 at 04:00:18PM +0100, Arnaldo Carvalho de Melo wrote: >>> After applying your patch it works, had to tweak the commit log to avoid >>> starting lines with ---, breaks git scripts, also added a commiter log, >>> check it, patch is below, after the one I used to not process any >>> samples. > >> Sorry for the '---' problem, I'll bear that in mind in future. > >> Your commit log looks fine, though I'm slightly confused by the >> Reported-by line -- did you mean to add that? > > Sorry, I mean Tested-by:, will replace, guess i can replace the one for > Adrian from Cc: to Tested-by too, right? Yes > > - Arnaldo > >> Mark. >> >>> commit dd486ec4aa33cfca2fd912ef501d49909005de79 >>> Author: Mark Rutland <mark.rutland@arm.com> >>> Date: Wed Sep 16 18:18:49 2015 +0100 >>> >>> perf record: Avoid infinite loop at buildid processing with no samples >>> >>> If a session contains no events, we can get stuck in an infinite loop in >>> __perf_session__process_events, with a non-zero file_size and data_offset, but >>> a zero data_size. >>> >>> In this case, we can mmap the entirety of the file (consisting of the file and >>> attribute headers), and fetch_mmaped_event will correctly refuse to read any >>> (unmapped and non-existent) event headers. This causes >>> __perf_session__process_events to unmap the file and retry with the exact same >>> parameters, getting stuck in an infinite loop. >>> >>> This has been observed to result in an exit-time hang when counting >>> rare/unschedulable events with perf record, and can be triggered artificially >>> with the script below: >>> >>> ---- >>> #!/bin/sh >>> printf "REPRO: launching perf\n"; >>> ./perf record -e software/config=9/ sleep 1 & >>> PERF_PID=$!; >>> sleep 0.002; >>> kill -2 $PERF_PID; >>> printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID"; >>> wait $PERF_PID; >>> printf "REPRO: perf exited\n"; >>> ---- >>> >>> To avoid this, have __perf_session__process_events bail out early when >>> the file has no data (i.e. it has no events). >>> >>> Commiter note: >>> >>> I only managed to reproduce this when setting >>> /proc/sys/kernel/kptr_restrict to '1' and changing the code to >>> purposefully not process any samples and no synthesized samples, i.e. >>> kptr_restrict prevents 'record' from synthesizing the kernel mmaps for >>> vmlinux + modules and since it is a workload started from perf, we don't >>> synthesize mmap/comm records for existing threads. >>> >>> Adrian Hunter managed to reproduce it in his environment tho. >>> >>> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> >>> Signed-off-by: Mark Rutland <mark.rutland@arm.com> >>> Cc: Adrian Hunter <adrian.hunter@intel.com> >>> Cc: Jiri Olsa <jolsa@redhat.com> >>> Cc: Peter Zijlstra <peterz@infradead.org> >>> Link: http://lkml.kernel.org/r/1442423929-12253-1-git-send-email-mark.rutland@arm.com >>> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> >>> >>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c >>> index 8a4537ee9bc3..fc3f7c922f99 100644 >>> --- a/tools/perf/util/session.c >>> +++ b/tools/perf/util/session.c >>> @@ -1580,7 +1580,10 @@ static int __perf_session__process_events(struct perf_session *session, >>> file_offset = page_offset; >>> head = data_offset - page_offset; >>> >>> - if (data_size && (data_offset + data_size < file_size)) >>> + if (data_size == 0) >>> + goto out; >>> + >>> + if (data_offset + data_size < file_size) >>> file_size = data_offset + data_size; >>> >>> ui_progress__init(&prog, file_size, "Processing events..."); >>> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 8a4537e..fc3f7c9 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -1580,7 +1580,10 @@ static int __perf_session__process_events(struct perf_session *session, file_offset = page_offset; head = data_offset - page_offset; - if (data_size && (data_offset + data_size < file_size)) + if (data_size == 0) + goto out; + + if (data_offset + data_size < file_size) file_size = data_offset + data_size; ui_progress__init(&prog, file_size, "Processing events...");
If a session contains no events, we can get stuck in an infinite loop in __perf_session__process_events, with a non-zero file_size and data_offset, but a zero data_size. In this case, we can mmap the entirety of the file (consisting of the file and attribute headers), and fetch_mmaped_event will correctly refuse to read any (unmapped and non-existent) event headers. This causes __perf_session__process_events to unmap the file and retry with the exact same parameters, getting stuck in an infinite loop. This has been observed to result in an exit-time hang when counting rare/unschedulable events with perf record, and can be triggered artificially with the script below: ---- #!/bin/sh printf "REPRO: launching perf\n"; ./perf record -e software/config=9/ sleep 1 & PERF_PID=$!; sleep 0.002; kill -2 $PERF_PID; printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID"; wait $PERF_PID; printf "REPRO: perf exited\n"; ---- To avoid this, have __perf_session__process_events bail out early when the file has no data (i.e. it has no events). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> --- tools/perf/util/session.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)