From patchwork Thu Aug 22 14:13:20 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Richter X-Patchwork-Id: 19411 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-vb0-f72.google.com (mail-vb0-f72.google.com [209.85.212.72]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 5D9B5248E6 for ; Thu, 22 Aug 2013 14:14:46 +0000 (UTC) Received: by mail-vb0-f72.google.com with SMTP id f12sf1553451vbg.7 for ; Thu, 22 Aug 2013 07:14:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:delivered-to:sender:from:to:cc:subject:date:message-id :in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=cz33jxehRA2q3uCE0OQdab5mfwCLV+JfS1t0qEyzRlA=; b=xH/1ZwKCCEaT+CIWwZj8sqtdRyk226rY1rmWle8JAakjfEqZOo1Zo6xu485l+XnIFp D/W58W5TMCvrdaOWHrLlaah0GOv9JDT5HxJKHyVxpmHClyl97DnqzxbPBtX+meqEkMrv UgHurPEqjWXci44ewKcx1p0cSdbUAJNktLSy3V4NKglbZJ032+5iDZOPS2MhGDuZb7s+ 33qAmGjinNyfrov4nuHG3bWGMUwJzKHqnpAB+GwU7BMibB7tcdx5lfdR1cTpn1TuJRCb HnQHzXYFYXcgnnOtuUPjxY56S5g784xa3YAemURbB30gJMwLvk3KBzF1zKsJpzhgUkwc Z9/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-gm-message-state:delivered-to:sender:from:to:cc :subject:date:message-id:in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=cz33jxehRA2q3uCE0OQdab5mfwCLV+JfS1t0qEyzRlA=; b=eOOrplNB2gsXQlwuHMFlUtWdOIpZLCnXTyaEzzTcWh2cZoKdAcJJUJlNz4ab+ukxuG h5skQQkmUcqZRIfcP7aeNWd3PYpLtWLZslF75tn0+/gVTHCIpVyfkB3T+3FQv0kNp9Ny ILlMY3oO7FeQbjDZVGZHwxuJ18l9NDIPIqcQvvF1e81QpDbz6L31bTTOmdsXlhukPChW cwEFDFi8AkX7NMKPHPdPcYnT5/8jHhFn0ojj3l4oMBQtofEPrLX3UriIdSYpGd4wBZA4 FAd6gTF3SaU5lTs7sPu8tg2TTaUdMQaCVM3Wpz9BBGSi8xSn8Gf+cWYvQI0V0f/4GzB9 U5+Q== X-Received: by 10.236.227.165 with SMTP id d35mr4584152yhq.51.1377180886166; Thu, 22 Aug 2013 07:14:46 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.4.8 with SMTP id g8ls1095531qeg.67.gmail; Thu, 22 Aug 2013 07:14:46 -0700 (PDT) X-Received: by 10.220.16.73 with SMTP id n9mr2019817vca.24.1377180886058; Thu, 22 Aug 2013 07:14:46 -0700 (PDT) Received: from mail-vc0-x22d.google.com (mail-vc0-x22d.google.com [2607:f8b0:400c:c03::22d]) by mx.google.com with ESMTPS id i15si4004432vep.130.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 22 Aug 2013 07:14:46 -0700 (PDT) Received-SPF: neutral (google.com: 2607:f8b0:400c:c03::22d is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=2607:f8b0:400c:c03::22d; Received: by mail-vc0-f173.google.com with SMTP id id13so1093874vcb.18 for ; Thu, 22 Aug 2013 07:14:46 -0700 (PDT) X-Gm-Message-State: ALoCoQlT58i62YevsdNR98TN8MDU33Ln/i6bVIedEB7X0+vTmqVrGABbzUwESrr1CW9g7jCr73EP X-Received: by 10.52.120.7 with SMTP id ky7mr9875025vdb.12.1377180885861; Thu, 22 Aug 2013 07:14:45 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.174.196 with SMTP id u4csp33210vcz; Thu, 22 Aug 2013 07:14:45 -0700 (PDT) X-Received: by 10.204.241.75 with SMTP id ld11mr2283681bkb.35.1377180884380; Thu, 22 Aug 2013 07:14:44 -0700 (PDT) Received: from mail-bk0-x232.google.com (mail-bk0-x232.google.com [2a00:1450:4008:c01::232]) by mx.google.com with ESMTPS id ql10si2298689bkb.337.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 22 Aug 2013 07:14:44 -0700 (PDT) Received-SPF: pass (google.com: domain of rric.net@gmail.com designates 2a00:1450:4008:c01::232 as permitted sender) client-ip=2a00:1450:4008:c01::232; Received: by mail-bk0-f50.google.com with SMTP id mz11so712774bkb.37 for ; Thu, 22 Aug 2013 07:14:43 -0700 (PDT) X-Received: by 10.205.5.6 with SMTP id oe6mr2278448bkb.36.1377180883079; Thu, 22 Aug 2013 07:14:43 -0700 (PDT) Received: from rric.localhost (g224195237.adsl.alicedsl.de. [92.224.195.237]) by mx.google.com with ESMTPSA id jh13sm3079991bkb.13.1969.12.31.16.00.00 (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 22 Aug 2013 07:14:42 -0700 (PDT) Sender: Robert Richter From: Robert Richter To: Peter Zijlstra Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Borislav Petkov , Jiri Olsa , linux-kernel@vger.kernel.org, Robert Richter , Fengguang Wu , Robert Richter Subject: [PATCH v3 05/12] perf: Add persistent events Date: Thu, 22 Aug 2013 16:13:20 +0200 Message-Id: <1377180807-12758-6-git-send-email-rric@kernel.org> X-Mailer: git-send-email 1.8.3.2 In-Reply-To: <1377180807-12758-1-git-send-email-rric@kernel.org> References: <1377180807-12758-1-git-send-email-rric@kernel.org> X-Original-Sender: rric.net@gmail.com X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 2607:f8b0:400c:c03::22d is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gmail.com Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , From: Robert Richter Add the needed pieces for persistent events which makes them process-agnostic. Also, make their buffers read-only when mmaping them from userspace. Add a barebones implementation for registering persistent events with perf. For that, we don't destroy the buffers when they're unmapped; also, we map them read-only so that multiple agents can access them. Also, we allocate the event buffers at event init time and not at mmap time so that we can log samples into them regardless of whether there are readers in userspace or not. Multiple events from different cpus may map to a single persistent event entry which has a unique identifier. The identifier allows to access the persistent event with the perf_event_open() syscall. For this the new event type PERF_TYPE_PERSISTENT must be set with its id specified in attr.config. Currently there is only support for per-cpu events. Also, root access is required. Since the buffers are shared, the set_output ioctl may not be used in conjunction with persistent events. This patch only supports trace_points, support for all event types is implemented in a later patch. Based on patch set from Borislav Petkov . Cc: Borislav Petkov Cc: Fengguang Wu Cc: Jiri Olsa Signed-off-by: Robert Richter Signed-off-by: Robert Richter --- include/linux/perf_event.h | 12 ++- include/uapi/linux/perf_event.h | 4 +- kernel/events/Makefile | 2 +- kernel/events/core.c | 37 +++++-- kernel/events/internal.h | 2 + kernel/events/persistent.c | 221 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 266 insertions(+), 12 deletions(-) create mode 100644 kernel/events/persistent.c diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index c43f6ea..1a62a25 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -436,6 +436,8 @@ struct perf_event { struct perf_cgroup *cgrp; /* cgroup event is attach to */ int cgrp_defer_enabled; #endif + struct list_head pevent_entry; /* persistent event */ + int pevent_id; #endif /* CONFIG_PERF_EVENTS */ }; @@ -765,7 +767,7 @@ extern void perf_event_enable(struct perf_event *event); extern void perf_event_disable(struct perf_event *event); extern int __perf_event_disable(void *info); extern void perf_event_task_tick(void); -#else +#else /* !CONFIG_PERF_EVENTS */ static inline void perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task) { } @@ -805,7 +807,7 @@ static inline void perf_event_enable(struct perf_event *event) { } static inline void perf_event_disable(struct perf_event *event) { } static inline int __perf_event_disable(void *info) { return -1; } static inline void perf_event_task_tick(void) { } -#endif +#endif /* !CONFIG_PERF_EVENTS */ #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL) extern bool perf_event_can_stop_tick(void); @@ -819,6 +821,12 @@ extern void perf_restore_debug_store(void); static inline void perf_restore_debug_store(void) { } #endif +#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_EVENT_TRACING) +extern int perf_add_persistent_tp(struct ftrace_event_call *tp); +#else +static inline int perf_add_persistent_tp(void *tp) { return -ENOENT; } +#endif + #define perf_output_put(handle, x) perf_output_copy((handle), &(x), sizeof(x)) /* diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 62c25a2..2b84b97 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -32,6 +32,7 @@ enum perf_type_id { PERF_TYPE_HW_CACHE = 3, PERF_TYPE_RAW = 4, PERF_TYPE_BREAKPOINT = 5, + PERF_TYPE_PERSISTENT = 6, PERF_TYPE_MAX, /* non-ABI */ }; @@ -275,8 +276,9 @@ struct perf_event_attr { exclude_callchain_kernel : 1, /* exclude kernel callchains */ exclude_callchain_user : 1, /* exclude user callchains */ + persistent : 1, /* always-on event */ - __reserved_1 : 41; + __reserved_1 : 40; union { __u32 wakeup_events; /* wakeup every n events */ diff --git a/kernel/events/Makefile b/kernel/events/Makefile index 103f5d1..70990d5 100644 --- a/kernel/events/Makefile +++ b/kernel/events/Makefile @@ -2,7 +2,7 @@ ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_core.o = -pg endif -obj-y := core.o ring_buffer.o callchain.o +obj-y := core.o ring_buffer.o callchain.o persistent.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_UPROBES) += uprobes.o diff --git a/kernel/events/core.c b/kernel/events/core.c index 932acc6..d9d6e67 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3982,6 +3982,9 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma) if (!(vma->vm_flags & VM_SHARED)) return -EINVAL; + if (event->attr.persistent && (vma->vm_flags & VM_WRITE)) + return -EACCES; + vma_size = vma->vm_end - vma->vm_start; nr_pages = (vma_size / PAGE_SIZE) - 1; @@ -4007,6 +4010,11 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma) goto unlock; } + if (!event->rb->overwrite && vma->vm_flags & VM_WRITE) { + ret = -EACCES; + goto unlock; + } + if (!atomic_inc_not_zero(&event->rb->mmap_count)) { /* * Raced against perf_mmap_close() through @@ -5845,7 +5853,7 @@ static struct pmu perf_tracepoint = { .event_idx = perf_swevent_event_idx, }; -static inline void perf_tp_register(void) +static inline void perf_register_tp(void) { perf_pmu_register(&perf_tracepoint, "tracepoint", PERF_TYPE_TRACEPOINT); } @@ -5875,18 +5883,14 @@ static void perf_event_free_filter(struct perf_event *event) #else -static inline void perf_tp_register(void) -{ -} +static inline void perf_register_tp(void) { } static int perf_event_set_filter(struct perf_event *event, void __user *arg) { return -ENOENT; } -static void perf_event_free_filter(struct perf_event *event) -{ -} +static void perf_event_free_filter(struct perf_event *event) { } #endif /* CONFIG_EVENT_TRACING */ @@ -6574,6 +6578,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, INIT_LIST_HEAD(&event->event_entry); INIT_LIST_HEAD(&event->sibling_list); INIT_LIST_HEAD(&event->rb_entry); + INIT_LIST_HEAD(&event->pevent_entry); init_waitqueue_head(&event->waitq); init_irq_work(&event->pending, perf_pending_event); @@ -6831,6 +6836,13 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event) goto unlock; } + /* Don't redirect read-only (persistent) events. */ + ret = -EACCES; + if (old_rb && !old_rb->overwrite) + goto unlock; + if (rb && !rb->overwrite) + goto unlock; + if (old_rb) ring_buffer_detach(event, old_rb); @@ -6888,6 +6900,14 @@ SYSCALL_DEFINE5(perf_event_open, if (err) return err; + /* return fd for an existing persistent event */ + if (attr.type == PERF_TYPE_PERSISTENT) + return perf_get_persistent_event_fd(cpu, attr.config); + + /* put event into persistent state (not yet supported) */ + if (attr.persistent) + return -EOPNOTSUPP; + if (!attr.exclude_kernel) { if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN)) return -EACCES; @@ -7828,7 +7848,8 @@ void __init perf_event_init(void) perf_pmu_register(&perf_swevent, "software", PERF_TYPE_SOFTWARE); perf_pmu_register(&perf_cpu_clock, NULL, -1); perf_pmu_register(&perf_task_clock, NULL, -1); - perf_tp_register(); + perf_register_tp(); + perf_register_persistent(); perf_cpu_notifier(perf_cpu_notify); register_reboot_notifier(&perf_reboot_notifier); diff --git a/kernel/events/internal.h b/kernel/events/internal.h index d8708aa..94c3f73 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -193,5 +193,7 @@ static inline void put_event(struct perf_event *event) extern int perf_alloc_rb(struct perf_event *event, int nr_pages, int flags); extern void perf_free_rb(struct perf_event *event); extern int perf_get_fd(struct perf_event *event); +extern int perf_get_persistent_event_fd(int cpu, int id); +extern void __init perf_register_persistent(void); #endif /* _KERNEL_EVENTS_INTERNAL_H */ diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c new file mode 100644 index 0000000..926654f --- /dev/null +++ b/kernel/events/persistent.c @@ -0,0 +1,221 @@ +#include +#include +#include + +#include "internal.h" + +/* 512 kiB: default perf tools memory size, see perf_evlist__mmap() */ +#define CPU_BUFFER_NR_PAGES ((512 * 1024) / PAGE_SIZE) + +struct pevent { + char *name; + int id; +}; + +static DEFINE_PER_CPU(struct list_head, pevents); +static DEFINE_PER_CPU(struct mutex, pevents_lock); + +/* Must be protected with pevents_lock. */ +static struct perf_event *__pevent_find(int cpu, int id) +{ + struct perf_event *event; + + list_for_each_entry(event, &per_cpu(pevents, cpu), pevent_entry) { + if (event->pevent_id == id) + return event; + } + + return NULL; +} + +static int pevent_add(struct pevent *pevent, struct perf_event *event) +{ + int ret = -EEXIST; + int cpu = event->cpu; + + mutex_lock(&per_cpu(pevents_lock, cpu)); + + if (__pevent_find(cpu, pevent->id)) + goto unlock; + + if (event->pevent_id) + goto unlock; + + ret = 0; + event->pevent_id = pevent->id; + list_add_tail(&event->pevent_entry, &per_cpu(pevents, cpu)); +unlock: + mutex_unlock(&per_cpu(pevents_lock, cpu)); + + return ret; +} + +static struct perf_event *pevent_del(struct pevent *pevent, int cpu) +{ + struct perf_event *event; + + mutex_lock(&per_cpu(pevents_lock, cpu)); + + event = __pevent_find(cpu, pevent->id); + if (event) { + list_del(&event->pevent_entry); + event->pevent_id = 0; + } + + mutex_unlock(&per_cpu(pevents_lock, cpu)); + + return event; +} + +static void persistent_event_release(struct perf_event *event) +{ + /* + * Safe since we hold &event->mmap_count. The ringbuffer is + * released with put_event() if there are no other references. + * In this case there are also no other mmaps. + */ + atomic_dec(&event->rb->mmap_count); + atomic_dec(&event->mmap_count); + put_event(event); +} + +static int persistent_event_open(int cpu, struct pevent *pevent, + struct perf_event_attr *attr, int nr_pages) +{ + struct perf_event *event; + int ret; + + event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL); + if (IS_ERR(event)) + return PTR_ERR(event); + + if (nr_pages < 0) + nr_pages = CPU_BUFFER_NR_PAGES; + + ret = perf_alloc_rb(event, nr_pages, 0); + if (ret) + goto fail; + + ret = pevent_add(pevent, event); + if (ret) + goto fail; + + atomic_inc(&event->mmap_count); + + /* All workie, enable event now */ + perf_event_enable(event); + + return ret; +fail: + perf_event_release_kernel(event); + return ret; +} + +static void persistent_event_close(int cpu, struct pevent *pevent) +{ + struct perf_event *event = pevent_del(pevent, cpu); + if (event) + persistent_event_release(event); +} + +static int __maybe_unused +persistent_open(char *name, struct perf_event_attr *attr, int nr_pages) +{ + struct pevent *pevent; + char id_buf[32]; + int cpu; + int ret = 0; + + pevent = kzalloc(sizeof(*pevent), GFP_KERNEL); + if (!pevent) + return -ENOMEM; + + pevent->id = attr->config; + + if (!name) { + snprintf(id_buf, sizeof(id_buf), "%d", pevent->id); + name = id_buf; + } + + pevent->name = kstrdup(name, GFP_KERNEL); + if (!pevent->name) { + ret = -ENOMEM; + goto fail; + } + + for_each_possible_cpu(cpu) { + ret = persistent_event_open(cpu, pevent, attr, nr_pages); + if (ret) + goto fail; + } + + return 0; +fail: + for_each_possible_cpu(cpu) + persistent_event_close(cpu, pevent); + kfree(pevent->name); + kfree(pevent); + + pr_err("%s: Error adding persistent event: %d\n", + __func__, ret); + + return ret; +} + +#ifdef CONFIG_EVENT_TRACING + +int perf_add_persistent_tp(struct ftrace_event_call *tp) +{ + struct perf_event_attr attr; + + memset(&attr, 0, sizeof(attr)); + attr.sample_period = 1; + attr.wakeup_events = 1; + attr.sample_type = PERF_SAMPLE_RAW; + attr.persistent = 1; + attr.config = tp->event.type; + attr.type = PERF_TYPE_TRACEPOINT; + attr.size = sizeof(attr); + + return persistent_open(tp->name, &attr, -1); +} + +#endif /* CONFIG_EVENT_TRACING */ + +int perf_get_persistent_event_fd(int cpu, int id) +{ + struct perf_event *event; + int event_fd = 0; + + if ((unsigned)cpu >= nr_cpu_ids) + return -EINVAL; + + /* Must be root for persistent events */ + if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN)) + return -EACCES; + + mutex_lock(&per_cpu(pevents_lock, cpu)); + event = __pevent_find(cpu, id); + if (!event || !try_get_event(event)) + event_fd = -ENOENT; + mutex_unlock(&per_cpu(pevents_lock, cpu)); + + if (event_fd) + return event_fd; + + event_fd = perf_get_fd(event); + if (event_fd < 0) + put_event(event); + + return event_fd; +} + +void __init perf_register_persistent(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + INIT_LIST_HEAD(&per_cpu(pevents, cpu)); + mutex_init(&per_cpu(pevents_lock, cpu)); + } +}