Message ID | 20240117144704.602-14-graf@amazon.com |
---|---|
State | New |
Headers | show |
Series | kexec: Allow preservation of ftrace buffers | expand |
Hi Alexander, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [cannot apply to tip/x86/core arm64/for-next/core akpm-mm/mm-everything v6.7 next-20240117] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Alexander-Graf/mm-memblock-Add-support-for-scratch-memory/20240117-225136 base: linus/master patch link: https://lore.kernel.org/r/20240117144704.602-14-graf%40amazon.com patch subject: [PATCH v3 13/17] tracing: Recover trace buffers from kexec handover config: arc-defconfig (https://download.01.org/0day-ci/archive/20240118/202401181457.dzB2femp-lkp@intel.com/config) compiler: arc-elf-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240118/202401181457.dzB2femp-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202401181457.dzB2femp-lkp@intel.com/ All warnings (new ones prefixed by >>): kernel/trace/ring_buffer.c: In function 'rb_kho_replace_buffers': >> kernel/trace/ring_buffer.c:5936:66: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 5936 | cpu_buffer->head_page->list.prev->next = (struct list_head *) | ^ kernel/trace/ring_buffer.c:5939:44: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 5939 | bpage->list.next = (struct list_head *)((ulong)new_lhead | rb_page_head); | ^ -- >> kernel/trace/ring_buffer.c:1783: warning: Function parameter or struct member 'tr_off' not described in '__ring_buffer_alloc' vim +5936 kernel/trace/ring_buffer.c 5896 5897 static int rb_kho_replace_buffers(struct ring_buffer_per_cpu *cpu_buffer, 5898 struct rb_kho_cpu *kho) 5899 { 5900 bool first_loop = true; 5901 struct list_head *tmp; 5902 int err = 0; 5903 int i = 0; 5904 5905 if (!IS_ENABLED(CONFIG_FTRACE_KHO)) 5906 return -EINVAL; 5907 5908 if (kho->nr_mems != cpu_buffer->nr_pages * 2) 5909 return -EINVAL; 5910 5911 for (tmp = rb_list_head(cpu_buffer->pages); 5912 tmp != rb_list_head(cpu_buffer->pages) || first_loop; 5913 tmp = rb_list_head(tmp->next), first_loop = false) { 5914 struct buffer_page *bpage = (struct buffer_page *)tmp; 5915 const struct kho_mem *mem_bpage = &kho->mem[i++]; 5916 const struct kho_mem *mem_page = &kho->mem[i++]; 5917 const uint64_t rb_page_head = 1; 5918 struct buffer_page *old_bpage; 5919 void *old_page; 5920 5921 old_bpage = __va(mem_bpage->addr); 5922 if (!bpage) 5923 goto out; 5924 5925 if ((ulong)old_bpage->list.next & rb_page_head) { 5926 struct list_head *new_lhead; 5927 struct buffer_page *new_head; 5928 5929 new_lhead = rb_list_head(bpage->list.next); 5930 new_head = (struct buffer_page *)new_lhead; 5931 5932 /* Assume the buffer is completely full */ 5933 cpu_buffer->tail_page = bpage; 5934 cpu_buffer->commit_page = bpage; 5935 /* Set the head pointers to what they were before */ > 5936 cpu_buffer->head_page->list.prev->next = (struct list_head *) 5937 ((ulong)cpu_buffer->head_page->list.prev->next & ~rb_page_head); 5938 cpu_buffer->head_page = new_head; 5939 bpage->list.next = (struct list_head *)((ulong)new_lhead | rb_page_head); 5940 } 5941 5942 if (rb_page_entries(old_bpage) || rb_page_write(old_bpage)) { 5943 /* 5944 * We want to recycle the pre-kho page, it contains 5945 * trace data. To do so, we unreserve it and swap the 5946 * current data page with the pre-kho one 5947 */ 5948 old_page = kho_claim_mem(mem_page); 5949 5950 /* Recycle the old page, it contains data */ 5951 free_page((ulong)bpage->page); 5952 bpage->page = old_page; 5953 5954 bpage->write = old_bpage->write; 5955 bpage->entries = old_bpage->entries; 5956 bpage->real_end = old_bpage->real_end; 5957 5958 local_inc(&cpu_buffer->pages_touched); 5959 } else { 5960 kho_return_mem(mem_page); 5961 } 5962 5963 kho_return_mem(mem_bpage); 5964 } 5965 5966 out: 5967 return err; 5968 } 5969
Hi Alexander, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [cannot apply to tip/x86/core arm64/for-next/core akpm-mm/mm-everything v6.7 next-20240118] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Alexander-Graf/mm-memblock-Add-support-for-scratch-memory/20240117-225136 base: linus/master patch link: https://lore.kernel.org/r/20240117144704.602-14-graf%40amazon.com patch subject: [PATCH v3 13/17] tracing: Recover trace buffers from kexec handover config: i386-randconfig-061-20240118 (https://download.01.org/0day-ci/archive/20240118/202401182318.vEGddOt1-lkp@intel.com/config) compiler: ClangBuiltLinux clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240118/202401182318.vEGddOt1-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202401182318.vEGddOt1-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) kernel/trace/ring_buffer.c:1105:32: sparse: sparse: incorrect type in return expression (different base types) @@ expected restricted __poll_t @@ got int @@ kernel/trace/ring_buffer.c:1105:32: sparse: expected restricted __poll_t kernel/trace/ring_buffer.c:1105:32: sparse: got int kernel/trace/ring_buffer.c:4955:9: sparse: sparse: context imbalance in 'ring_buffer_peek' - different lock contexts for basic block kernel/trace/ring_buffer.c:5041:9: sparse: sparse: context imbalance in 'ring_buffer_consume' - different lock contexts for basic block kernel/trace/ring_buffer.c:5421:17: sparse: sparse: context imbalance in 'ring_buffer_empty' - different lock contexts for basic block kernel/trace/ring_buffer.c:5451:9: sparse: sparse: context imbalance in 'ring_buffer_empty_cpu' - different lock contexts for basic block >> kernel/trace/ring_buffer.c:5937:82: sparse: sparse: non size-preserving integer to pointer cast kernel/trace/ring_buffer.c:5939:84: sparse: sparse: non size-preserving integer to pointer cast vim +5937 kernel/trace/ring_buffer.c 5896 5897 static int rb_kho_replace_buffers(struct ring_buffer_per_cpu *cpu_buffer, 5898 struct rb_kho_cpu *kho) 5899 { 5900 bool first_loop = true; 5901 struct list_head *tmp; 5902 int err = 0; 5903 int i = 0; 5904 5905 if (!IS_ENABLED(CONFIG_FTRACE_KHO)) 5906 return -EINVAL; 5907 5908 if (kho->nr_mems != cpu_buffer->nr_pages * 2) 5909 return -EINVAL; 5910 5911 for (tmp = rb_list_head(cpu_buffer->pages); 5912 tmp != rb_list_head(cpu_buffer->pages) || first_loop; 5913 tmp = rb_list_head(tmp->next), first_loop = false) { 5914 struct buffer_page *bpage = (struct buffer_page *)tmp; 5915 const struct kho_mem *mem_bpage = &kho->mem[i++]; 5916 const struct kho_mem *mem_page = &kho->mem[i++]; 5917 const uint64_t rb_page_head = 1; 5918 struct buffer_page *old_bpage; 5919 void *old_page; 5920 5921 old_bpage = __va(mem_bpage->addr); 5922 if (!bpage) 5923 goto out; 5924 5925 if ((ulong)old_bpage->list.next & rb_page_head) { 5926 struct list_head *new_lhead; 5927 struct buffer_page *new_head; 5928 5929 new_lhead = rb_list_head(bpage->list.next); 5930 new_head = (struct buffer_page *)new_lhead; 5931 5932 /* Assume the buffer is completely full */ 5933 cpu_buffer->tail_page = bpage; 5934 cpu_buffer->commit_page = bpage; 5935 /* Set the head pointers to what they were before */ 5936 cpu_buffer->head_page->list.prev->next = (struct list_head *) > 5937 ((ulong)cpu_buffer->head_page->list.prev->next & ~rb_page_head); 5938 cpu_buffer->head_page = new_head; 5939 bpage->list.next = (struct list_head *)((ulong)new_lhead | rb_page_head); 5940 } 5941 5942 if (rb_page_entries(old_bpage) || rb_page_write(old_bpage)) { 5943 /* 5944 * We want to recycle the pre-kho page, it contains 5945 * trace data. To do so, we unreserve it and swap the 5946 * current data page with the pre-kho one 5947 */ 5948 old_page = kho_claim_mem(mem_page); 5949 5950 /* Recycle the old page, it contains data */ 5951 free_page((ulong)bpage->page); 5952 bpage->page = old_page; 5953 5954 bpage->write = old_bpage->write; 5955 bpage->entries = old_bpage->entries; 5956 bpage->real_end = old_bpage->real_end; 5957 5958 local_inc(&cpu_buffer->pages_touched); 5959 } else { 5960 kho_return_mem(mem_page); 5961 } 5962 5963 kho_return_mem(mem_bpage); 5964 } 5965 5966 out: 5967 return err; 5968 } 5969
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 1c5eb33f0cb5..f6d6ce441890 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -84,20 +84,23 @@ void ring_buffer_discard_commit(struct trace_buffer *buffer, /* * size is in bytes for each per CPU buffer. */ -struct trace_buffer * -__ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *key); +struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, + struct lock_class_key *key, + int tr_off); /* * Because the ring buffer is generic, if other users of the ring buffer get * traced by ftrace, it can produce lockdep warnings. We need to keep each * ring buffer's lock class separate. */ -#define ring_buffer_alloc(size, flags) \ -({ \ - static struct lock_class_key __key; \ - __ring_buffer_alloc((size), (flags), &__key); \ +#define ring_buffer_alloc_kho(size, flags, tr_off) \ +({ \ + static struct lock_class_key __key; \ + __ring_buffer_alloc((size), (flags), &__key, tr_off); \ }) +#define ring_buffer_alloc(size, flags) ring_buffer_alloc_kho(size, flags, 0) + int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 33b41013cda9..49da2e54126b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -558,6 +558,7 @@ struct trace_buffer { struct rb_irq_work irq_work; bool time_stamp_abs; + int tr_off; }; struct ring_buffer_iter { @@ -574,6 +575,15 @@ struct ring_buffer_iter { int missed_events; }; +struct rb_kho_cpu { + const struct kho_mem *mem; + uint32_t nr_mems; +}; + +static int rb_kho_replace_buffers(struct ring_buffer_per_cpu *cpu_buffer, + struct rb_kho_cpu *kho); +static int rb_kho_read_cpu(int tr_off, int cpu, struct rb_kho_cpu *kho); + #ifdef RB_TIME_32 /* @@ -1768,12 +1778,15 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer) * drop data when the tail hits the head. */ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, - struct lock_class_key *key) + struct lock_class_key *key, + int tr_off) { + int cpu = raw_smp_processor_id(); + struct rb_kho_cpu kho = {}; struct trace_buffer *buffer; + bool use_kho = false; long nr_pages; int bsize; - int cpu; int ret; /* keep it in its own cache line */ @@ -1786,9 +1799,16 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, goto fail_free_buffer; nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE); + if (!rb_kho_read_cpu(tr_off, cpu, &kho) && kho.nr_mems > 4) { + nr_pages = kho.nr_mems / 2; + use_kho = true; + pr_debug("Using kho on CPU [%03d]", cpu); + } + buffer->flags = flags; buffer->clock = trace_clock_local; buffer->reader_lock_key = key; + buffer->tr_off = tr_off; init_irq_work(&buffer->irq_work.work, rb_wake_up_waiters); init_waitqueue_head(&buffer->irq_work.waiters); @@ -1805,12 +1825,14 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, if (!buffer->buffers) goto fail_free_cpumask; - cpu = raw_smp_processor_id(); cpumask_set_cpu(cpu, buffer->cpumask); buffer->buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu); if (!buffer->buffers[cpu]) goto fail_free_buffers; + if (use_kho && rb_kho_replace_buffers(buffer->buffers[cpu], &kho)) + pr_warn("Could not revive all previous trace data"); + ret = cpuhp_state_add_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node); if (ret < 0) goto fail_free_buffers; @@ -5824,7 +5846,9 @@ EXPORT_SYMBOL_GPL(ring_buffer_read_page); */ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node) { + struct rb_kho_cpu kho = {}; struct trace_buffer *buffer; + bool use_kho = false; long nr_pages_same; int cpu_i; unsigned long nr_pages; @@ -5848,6 +5872,12 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node) /* allocate minimum pages, user can later expand it */ if (!nr_pages_same) nr_pages = 2; + + if (!rb_kho_read_cpu(buffer->tr_off, cpu, &kho) && kho.nr_mems > 4) { + nr_pages = kho.nr_mems / 2; + use_kho = true; + } + buffer->buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu); if (!buffer->buffers[cpu]) { @@ -5855,13 +5885,143 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node) cpu); return -ENOMEM; } + + if (use_kho && rb_kho_replace_buffers(buffer->buffers[cpu], &kho)) + pr_warn("Could not revive all previous trace data"); + smp_wmb(); cpumask_set_cpu(cpu, buffer->cpumask); return 0; } -#ifdef CONFIG_FTRACE_KHO -static int rb_kho_write_cpu(void *fdt, struct trace_buffer *buffer, int cpu) +static int rb_kho_replace_buffers(struct ring_buffer_per_cpu *cpu_buffer, + struct rb_kho_cpu *kho) +{ + bool first_loop = true; + struct list_head *tmp; + int err = 0; + int i = 0; + + if (!IS_ENABLED(CONFIG_FTRACE_KHO)) + return -EINVAL; + + if (kho->nr_mems != cpu_buffer->nr_pages * 2) + return -EINVAL; + + for (tmp = rb_list_head(cpu_buffer->pages); + tmp != rb_list_head(cpu_buffer->pages) || first_loop; + tmp = rb_list_head(tmp->next), first_loop = false) { + struct buffer_page *bpage = (struct buffer_page *)tmp; + const struct kho_mem *mem_bpage = &kho->mem[i++]; + const struct kho_mem *mem_page = &kho->mem[i++]; + const uint64_t rb_page_head = 1; + struct buffer_page *old_bpage; + void *old_page; + + old_bpage = __va(mem_bpage->addr); + if (!bpage) + goto out; + + if ((ulong)old_bpage->list.next & rb_page_head) { + struct list_head *new_lhead; + struct buffer_page *new_head; + + new_lhead = rb_list_head(bpage->list.next); + new_head = (struct buffer_page *)new_lhead; + + /* Assume the buffer is completely full */ + cpu_buffer->tail_page = bpage; + cpu_buffer->commit_page = bpage; + /* Set the head pointers to what they were before */ + cpu_buffer->head_page->list.prev->next = (struct list_head *) + ((ulong)cpu_buffer->head_page->list.prev->next & ~rb_page_head); + cpu_buffer->head_page = new_head; + bpage->list.next = (struct list_head *)((ulong)new_lhead | rb_page_head); + } + + if (rb_page_entries(old_bpage) || rb_page_write(old_bpage)) { + /* + * We want to recycle the pre-kho page, it contains + * trace data. To do so, we unreserve it and swap the + * current data page with the pre-kho one + */ + old_page = kho_claim_mem(mem_page); + + /* Recycle the old page, it contains data */ + free_page((ulong)bpage->page); + bpage->page = old_page; + + bpage->write = old_bpage->write; + bpage->entries = old_bpage->entries; + bpage->real_end = old_bpage->real_end; + + local_inc(&cpu_buffer->pages_touched); + } else { + kho_return_mem(mem_page); + } + + kho_return_mem(mem_bpage); + } + +out: + return err; +} + +static int rb_kho_read_cpu(int tr_off, int cpu, struct rb_kho_cpu *kho) +{ + const void *fdt = kho_get_fdt(); + int mem_len; + int err = 0; + char *path; + int off; + + if (!IS_ENABLED(CONFIG_FTRACE_KHO)) + return -EINVAL; + + if (!tr_off || !fdt || !kho) + return -EINVAL; + + path = kasprintf(GFP_KERNEL, "cpu%x", cpu); + if (!path) + return -ENOMEM; + + pr_debug("Trying to revive trace cpu '%s'", path); + + off = fdt_subnode_offset(fdt, tr_off, path); + if (off < 0) { + pr_debug("Could not find '%s' in DT", path); + err = -ENOENT; + goto out; + } + + err = fdt_node_check_compatible(fdt, off, "ftrace,cpu-v1"); + if (err) { + pr_warn("Node '%s' has invalid compatible", path); + err = -EINVAL; + goto out; + } + + kho->mem = fdt_getprop(fdt, off, "mem", &mem_len); + if (!kho->mem) { + pr_warn("Node '%s' has invalid mem property", path); + err = -EINVAL; + goto out; + } + + kho->nr_mems = mem_len / sizeof(*kho->mem); + + /* Should follow "bpage 0, page 0, bpage 1, page 1, ..." pattern */ + if ((kho->nr_mems & 1)) { + err = -EINVAL; + goto out; + } + +out: + kfree(path); + return err; +} + +static int __maybe_unused rb_kho_write_cpu(void *fdt, struct trace_buffer *buffer, int cpu) { int i = 0; int err = 0; @@ -5921,6 +6081,7 @@ static int rb_kho_write_cpu(void *fdt, struct trace_buffer *buffer, int cpu) return err; } +#ifdef CONFIG_FTRACE_KHO int ring_buffer_kho_write(void *fdt, struct trace_buffer *buffer) { int err, i; diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 9505a929a726..a5d7f5b4c19f 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -9362,16 +9362,46 @@ static struct dentry *trace_instance_dir; static void init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer); +static int trace_kho_off_tr(struct trace_array *tr) +{ + const char *name = tr->name ? tr->name : "global_trace"; + const void *fdt = kho_get_fdt(); + char *path; + int off; + + if (!IS_ENABLED(CONFIG_FTRACE_KHO)) + return 0; + + if (!fdt) + return 0; + + path = kasprintf(GFP_KERNEL, "/ftrace/%s", name); + if (!path) + return -ENOMEM; + + pr_debug("Trying to revive trace buffer '%s'", path); + + off = fdt_path_offset(fdt, path); + if (off < 0) { + pr_debug("Could not find '%s' in DT", path); + off = 0; + } + + kfree(path); + return off; +} + static int allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size) { + int tr_off = trace_kho_off_tr(tr); enum ring_buffer_flags rb_flags; rb_flags = tr->trace_flags & TRACE_ITER_OVERWRITE ? RB_FL_OVERWRITE : 0; buf->tr = tr; - buf->buffer = ring_buffer_alloc(size, rb_flags); + buf->buffer = ring_buffer_alloc_kho(size, rb_flags, tr_off); if (!buf->buffer) return -ENOMEM;
When kexec handover is in place, we now know the location of all previous buffers for ftrace rings. With this patch applied, ftrace reassembles any new trace buffer that carries the same name as a previous one with the same data pages that the previous buffer had. That way, a buffer that we had in place before kexec becomes readable after kexec again as soon as it gets initialized with the same name. Signed-off-by: Alexander Graf <graf@amazon.com> --- v1 -> v2: - Move from names to fdt offsets. That way, trace.c can find the trace array offset and then the ring buffer code only needs to read out its per-CPU data. That way it can stay oblivient to its name. - Make kho_get_fdt() const - Remove ifdefs --- include/linux/ring_buffer.h | 15 ++-- kernel/trace/ring_buffer.c | 171 ++++++++++++++++++++++++++++++++++-- kernel/trace/trace.c | 32 ++++++- 3 files changed, 206 insertions(+), 12 deletions(-)