Message ID | 1496500976-18362-1-git-send-email-leo.yan@linaro.org |
---|---|
Headers | show |
Series | coresight: support panic dump functionality | expand |
On 03/06/17 15:42, Leo Yan wrote: > ### Introduction ### > > Embedded Trace Buffer (ETB) provides on-chip storage of trace data, > usually has buffer size from 2KB to 8KB. These data has been used for > profiling and this has been well implemented in coresight driver. > > This patch set is to explore ETB RAM data for postmortem debugging. > We could consider ETB RAM data is quite useful for postmortem debugging, > especially if the hardware design with local ETB buffer (ARM DDI 0461B) > chapter 1.2.7. 'Local ETF', with this kind design every CPU has one > dedicated ETB RAM. So it's quite handy that we can use alive CPU to help > dump the hang CPU ETB RAM. Then we can quickly get to know what's the > exact execution flow before its hang. > > Due ETB RAM buffer has small size, if all CPUs shared one ETB buffer > then the trace data for causing error is easily to be overwritten by > other PEs; but even so sometimes we still have chance to go through the > trace data to assist debugging panic issues. > > ### Implementation ### > > Firstly we need provide a unified APIs for panic dump functionality, so > it can be easily extended to enable panic dump for multiple drivers. This > is finished by patch 0001, it registers panic notifier, and provide the > general APIs {coresight_add_panic_cb|coresight_del_panic_cb} as helper > functions so any coresight device can add into dump list or delete itself > as needed. > > Generally all the panic dump specific stuff are related to the sinks > devices, so this initial version code it only supports sink devices; and > Patch 0002 is to add and remove panic callback for sink devices. > > Patch 0003 and 0004 are to add panic callback functions for tmc and etb10 > drivers; so these two drivers can save specific trace data when panic > happens. > > NOTE: patch 0003 for tmc driver panic callback which has been verified on > Hikey board. patch 0004 for etb10 has not been tested due lack hardware > in hand. > > - After kernel panic happens, the kdump launches dump-capture kernel; > so we need save kernel's dump file on target: > cp /proc/vmcore ./vmcore > After we download vmcore file from Hikey board to host PC, we can > use 'crash' tool to check coresight dump info and extract trace data: > crash vmlinux vmcore > crash> log > [ 37.559337] coresight f6402000.etf: invoke panic dump... > [ 37.565460] coresight-tmc f6402000.etf: Dump ETB buffer 0x2000@0xffff80003b8da180 > crash> rd 0xffff80003b8da180 0x2000 -r cs_etb_trace.bin > Have you explored appending the above information as a vmcoreinfo parameter via vmcoreinfo_append_str() ? That would make it easier to list all the information above and if needed, we may be able to extend the makedumpfile to dump the ETB dumps from a given vmcore. Suzuki
On Mon, Jun 05, 2017 at 09:57:39AM +0100, Suzuki K Poulose wrote: > On 03/06/17 15:42, Leo Yan wrote: > >### Introduction ### > > > >Embedded Trace Buffer (ETB) provides on-chip storage of trace data, > >usually has buffer size from 2KB to 8KB. These data has been used for > >profiling and this has been well implemented in coresight driver. > > > >This patch set is to explore ETB RAM data for postmortem debugging. > >We could consider ETB RAM data is quite useful for postmortem debugging, > >especially if the hardware design with local ETB buffer (ARM DDI 0461B) > >chapter 1.2.7. 'Local ETF', with this kind design every CPU has one > >dedicated ETB RAM. So it's quite handy that we can use alive CPU to help > >dump the hang CPU ETB RAM. Then we can quickly get to know what's the > >exact execution flow before its hang. > > > >Due ETB RAM buffer has small size, if all CPUs shared one ETB buffer > >then the trace data for causing error is easily to be overwritten by > >other PEs; but even so sometimes we still have chance to go through the > >trace data to assist debugging panic issues. > > > >### Implementation ### > > > >Firstly we need provide a unified APIs for panic dump functionality, so > >it can be easily extended to enable panic dump for multiple drivers. This > >is finished by patch 0001, it registers panic notifier, and provide the > >general APIs {coresight_add_panic_cb|coresight_del_panic_cb} as helper > >functions so any coresight device can add into dump list or delete itself > >as needed. > > > >Generally all the panic dump specific stuff are related to the sinks > >devices, so this initial version code it only supports sink devices; and > >Patch 0002 is to add and remove panic callback for sink devices. > > > >Patch 0003 and 0004 are to add panic callback functions for tmc and etb10 > >drivers; so these two drivers can save specific trace data when panic > >happens. > > > >NOTE: patch 0003 for tmc driver panic callback which has been verified on > >Hikey board. patch 0004 for etb10 has not been tested due lack hardware > >in hand. > > > > >- After kernel panic happens, the kdump launches dump-capture kernel; > > so we need save kernel's dump file on target: > > cp /proc/vmcore ./vmcore > > > > After we download vmcore file from Hikey board to host PC, we can > > use 'crash' tool to check coresight dump info and extract trace data: > > crash vmlinux vmcore > > crash> log > > [ 37.559337] coresight f6402000.etf: invoke panic dump... > > [ 37.565460] coresight-tmc f6402000.etf: Dump ETB buffer 0x2000@0xffff80003b8da180 > > crash> rd 0xffff80003b8da180 0x2000 -r cs_etb_trace.bin > > > > Have you explored appending the above information as a vmcoreinfo parameter via > vmcoreinfo_append_str() ? That would make it easier to list all the information > above and if needed, we may be able to extend the makedumpfile to dump the ETB > dumps from a given vmcore. Thanks for good suggestion, Suzuki. After you pointed out vmcoreinfo_append_str() I look at it a bit just now, will add it in next version and verify on Hikey. Thanks, Leo Yan
On 5 June 2017 at 02:57, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote: > On 03/06/17 15:42, Leo Yan wrote: >> >> ### Introduction ### >> >> Embedded Trace Buffer (ETB) provides on-chip storage of trace data, >> usually has buffer size from 2KB to 8KB. These data has been used for >> profiling and this has been well implemented in coresight driver. >> >> This patch set is to explore ETB RAM data for postmortem debugging. >> We could consider ETB RAM data is quite useful for postmortem debugging, >> especially if the hardware design with local ETB buffer (ARM DDI 0461B) >> chapter 1.2.7. 'Local ETF', with this kind design every CPU has one >> dedicated ETB RAM. So it's quite handy that we can use alive CPU to help >> dump the hang CPU ETB RAM. Then we can quickly get to know what's the >> exact execution flow before its hang. >> >> Due ETB RAM buffer has small size, if all CPUs shared one ETB buffer >> then the trace data for causing error is easily to be overwritten by >> other PEs; but even so sometimes we still have chance to go through the >> trace data to assist debugging panic issues. >> >> ### Implementation ### >> >> Firstly we need provide a unified APIs for panic dump functionality, so >> it can be easily extended to enable panic dump for multiple drivers. This >> is finished by patch 0001, it registers panic notifier, and provide the >> general APIs {coresight_add_panic_cb|coresight_del_panic_cb} as helper >> functions so any coresight device can add into dump list or delete itself >> as needed. >> >> Generally all the panic dump specific stuff are related to the sinks >> devices, so this initial version code it only supports sink devices; and >> Patch 0002 is to add and remove panic callback for sink devices. >> >> Patch 0003 and 0004 are to add panic callback functions for tmc and etb10 >> drivers; so these two drivers can save specific trace data when panic >> happens. >> >> NOTE: patch 0003 for tmc driver panic callback which has been verified on >> Hikey board. patch 0004 for etb10 has not been tested due lack hardware >> in hand. >> > >> - After kernel panic happens, the kdump launches dump-capture kernel; >> so we need save kernel's dump file on target: >> cp /proc/vmcore ./vmcore > > > >> After we download vmcore file from Hikey board to host PC, we can >> use 'crash' tool to check coresight dump info and extract trace data: >> crash vmlinux vmcore >> crash> log >> [ 37.559337] coresight f6402000.etf: invoke panic dump... >> [ 37.565460] coresight-tmc f6402000.etf: Dump ETB buffer >> 0x2000@0xffff80003b8da180 >> crash> rd 0xffff80003b8da180 0x2000 -r cs_etb_trace.bin >> > > Have you explored appending the above information as a vmcoreinfo parameter > via > vmcoreinfo_append_str() ? That would make it easier to list all the > information > above and if needed, we may be able to extend the makedumpfile to dump the > ETB > dumps from a given vmcore. > Suzuki One thing this patchset doesn't address is the tracer configuration (metadata). I'm thinking we can use the same mechanism to store the relevant information in memory in the same format done for perf.
On 3 June 2017 at 08:42, Leo Yan <leo.yan@linaro.org> wrote: > ### Introduction ### Good day Leo, > > Embedded Trace Buffer (ETB) provides on-chip storage of trace data, > usually has buffer size from 2KB to 8KB. These data has been used for > profiling and this has been well implemented in coresight driver. > > This patch set is to explore ETB RAM data for postmortem debugging. > We could consider ETB RAM data is quite useful for postmortem debugging, > especially if the hardware design with local ETB buffer (ARM DDI 0461B) > chapter 1.2.7. 'Local ETF', with this kind design every CPU has one > dedicated ETB RAM. So it's quite handy that we can use alive CPU to help > dump the hang CPU ETB RAM. Then we can quickly get to know what's the > exact execution flow before its hang. > > Due ETB RAM buffer has small size, if all CPUs shared one ETB buffer > then the trace data for causing error is easily to be overwritten by > other PEs; but even so sometimes we still have chance to go through the > trace data to assist debugging panic issues. > > ### Implementation ### > > Firstly we need provide a unified APIs for panic dump functionality, so > it can be easily extended to enable panic dump for multiple drivers. This > is finished by patch 0001, it registers panic notifier, and provide the > general APIs {coresight_add_panic_cb|coresight_del_panic_cb} as helper > functions so any coresight device can add into dump list or delete itself > as needed. > > Generally all the panic dump specific stuff are related to the sinks > devices, so this initial version code it only supports sink devices; and > Patch 0002 is to add and remove panic callback for sink devices. > > Patch 0003 and 0004 are to add panic callback functions for tmc and etb10 > drivers; so these two drivers can save specific trace data when panic > happens. > > NOTE: patch 0003 for tmc driver panic callback which has been verified on > Hikey board. patch 0004 for etb10 has not been tested due lack hardware > in hand. > > ### Usage ### On top of my comments in the patches I think this section is interesting and worth its own text file under Documentation. We already have coresight.txt and coresight-cpu-debug.txt... As such I suggest you add a new "coresight" directory under Documentation/trace and move coresight.txt and coresight-cpu-debug.txt there. Once that is done you can add coresight-panic-dump.txt there. > > Below are the example for how to use panic dump functionality on 96boards > Hikey, the brief flow is: when the panic happens the ETB panic callback > function saves trace data into memory, then relies on kdump to use > recovery kernel to save DDR content as kernel core dump file; after we > transfer kernel core dump file from board to host PC, use 'crash' tool to > extract the coresight ETB trace data; finally we can use python script > to generate perf format compatible file and use 'perf' to output the > readable execution flow. > > - Save trace data into memory with kdump on Hikey: > > ARM64's kdump supports to use the same kernel image both for main > kernel and dump-capture kernel; so we can simply to load dump-capture > kernel with below command: > ./kexec -p vmlinux --dtb=hi6220-hikey.dtb --append="root=/dev/mmcblk0p9 > rw maxcpus=1 reset_devices earlycon=pl011,0xf7113000 nohlt > initcall_debug console=tty0 console=ttyAMA3,115200 clk_ignore_unused" > > Enable the coresight path for ETB device: > echo 1 > /sys/bus/coresight/devices/f6402000.etf/enable_sink > echo 1 > /sys/bus/coresight/devices/f659c000.etm/enable_source > echo 1 > /sys/bus/coresight/devices/f659d000.etm/enable_source > echo 1 > /sys/bus/coresight/devices/f659e000.etm/enable_source > echo 1 > /sys/bus/coresight/devices/f659f000.etm/enable_source > echo 1 > /sys/bus/coresight/devices/f65dc000.etm/enable_source > echo 1 > /sys/bus/coresight/devices/f65dd000.etm/enable_source > echo 1 > /sys/bus/coresight/devices/f65de000.etm/enable_source > echo 1 > /sys/bus/coresight/devices/f65df000.etm/enable_source > > - After kernel panic happens, the kdump launches dump-capture kernel; > so we need save kernel's dump file on target: > cp /proc/vmcore ./vmcore > > After we download vmcore file from Hikey board to host PC, we can > use 'crash' tool to check coresight dump info and extract trace data: > crash vmlinux vmcore > crash> log > [ 37.559337] coresight f6402000.etf: invoke panic dump... > [ 37.565460] coresight-tmc f6402000.etf: Dump ETB buffer 0x2000@0xffff80003b8da180 > crash> rd 0xffff80003b8da180 0x2000 -r cs_etb_trace.bin > > - Use python script perf_cs_dump_wrapper.py to wrap trace data for > perf format compatible file and finally use perf to output CPU > execution flow: > > On host PC run python script, please note now this script is not flexbile > to support all kinds of coresight topologies, this script still has hard coded > info related with coresight specific topology in Hikey: > python perf_cs_dump_wrapper.py -i cs_etb_trace.bin -o perf.data I'm not sure what we'll do with "perf_cs_dump_wrapper.py" yet... I suspect openCSD on github will be a good place for it but let's see about that later. Regards, Mathieu > > On Hikey board: > ./perf script -v -F cpu,event,ip,sym,symoff --kallsyms ksymbol -i perf.data -k vmlinux > > [002] instructions: ffff0000087d1d60 psci_cpu_suspend_enter+0x48 > [002] instructions: ffff000008093400 cpu_suspend+0x0 > [002] instructions: ffff000008093210 __cpu_suspend_enter+0x0 > [002] instructions: ffff000008099970 cpu_do_suspend+0x0 > [002] instructions: ffff000008093294 __cpu_suspend_enter+0x84 > [002] instructions: ffff000008093428 cpu_suspend+0x28 > [002] instructions: ffff00000809342c cpu_suspend+0x2c > [002] instructions: ffff0000087d1968 psci_suspend_finisher+0x0 > [002] instructions: ffff0000087d1768 psci_cpu_suspend+0x0 > [002] instructions: ffff0000087d19f0 __invoke_psci_fn_smc+0x0 > > Have uploaded related tools into folder: > http://people.linaro.org/~leo.yan/debug/coresight_dump/ > > Changes from RFC: > * Follow Mathieu's suggestion, use general framework to support dump > functionality. > * Changed to use perf to analyse trace data. > > Leo Yan (4): > coresight: support panic dump functionality > coresight: add and remove panic callback for sink > coresight: tmc: hook panic callback for ETB/ETF > coresight: etb10: hook panic callback > > drivers/hwtracing/coresight/Kconfig | 10 ++ > drivers/hwtracing/coresight/Makefile | 1 + > drivers/hwtracing/coresight/coresight-etb10.c | 16 +++ > drivers/hwtracing/coresight/coresight-panic-dump.c | 130 +++++++++++++++++++++ > drivers/hwtracing/coresight/coresight-priv.h | 10 ++ > drivers/hwtracing/coresight/coresight-tmc-etf.c | 26 +++++ > drivers/hwtracing/coresight/coresight.c | 11 ++ > include/linux/coresight.h | 2 + > 8 files changed, 206 insertions(+) > create mode 100644 drivers/hwtracing/coresight/coresight-panic-dump.c > > -- > 2.7.4 >