Message ID | 1415292718-19785-2-git-send-email-pawel.moll@arm.com |
---|---|
State | New |
Headers | show |
On Thu, Feb 12, 2015 at 11:38 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Feb 12, 2015 at 11:28:14AM +0100, Peter Zijlstra wrote: >> > and you would have to check the clocksource is TSC. >> >> It implicitly does that; it has that sched_clock_stable() thing, but >> yeah I suppose someone could change the clocksource even though the tsc >> is stable. >> >> Not using TSC when its available is quite crazy though.. but sure. > > Something like this on top then.. it might have a few header issues, the > whole asm/tsc.h vs clocksource.h thing looks like pain. > > I haven't tried to compile it, maybe we can move cycle_t into types and > fwd declare struct clocksource or whatnot. > > Of course, all this is quite horrible on the timekeeping side; it might > be tglx and/or jstutlz are having spasms just reading it :-) Oof.. Yea, this exposes all sorts of timekeeping internals out to the rest of the kernel that I'd rather not have out there. > --- a/arch/x86/kernel/cpu/perf_event.c > +++ b/arch/x86/kernel/cpu/perf_event.c > @@ -1967,17 +1967,19 @@ static void local_clock_user_time(struct > cyc2ns_read_end(data); > } > > -extern void notrace __ktime_get_mono_fast(u64 *offset, u32 *mult, u16 *shift); > +extern bool notrace __ktime_get_mono_fast(cycle_t (*read)(struct clocksource *cs), > + u64 *offset, u32 *mult, u16 *shift); > > static void ktime_fast_mono_user_time(struct perf_event_mmap_page *userpg, u64 now) > { > + if (!__ktime_get_mono_fast(read_tsc, &userpg->time_zero, > + &userpg->time_mult, > + &userpg->time_shift)) Soo.. instead of hard-coding read_tsc here, can we instead use a clocksource flag that we can check that the current clocksource is valid for this sort of use? thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 4c81a86..8ead8d8 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -91,6 +91,7 @@ parameter is applicable: NUMA NUMA support is enabled. NFS Appropriate NFS support is enabled. OSS OSS sound support is enabled. + PERF Performance events and counters support is enabled. PV_OPS A paravirtualized kernel is enabled. PARIDE The ParIDE (parallel port IDE) subsystem is enabled. PARISC The PA-RISC architecture is enabled. @@ -2763,6 +2764,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted. allocator. This parameter is primarily for debugging and performance comparison. + perf_use_local_clock + [PERF] + Use local_clock() as a source for perf timestamps + generation. This was be the default behaviour and + this parameter can be used to maintain backward + compatibility or on older hardware with expensive + monotonic clock source. + pf. [PARIDE] See Documentation/blockdev/paride.txt. diff --git a/kernel/events/core.c b/kernel/events/core.c index 2b02c9f..5d0aa03 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -42,6 +42,7 @@ #include <linux/module.h> #include <linux/mman.h> #include <linux/compat.h> +#include <linux/sysctl.h> #include "internal.h" @@ -322,8 +323,41 @@ extern __weak const char *perf_pmu_name(void) return "pmu"; } +static bool perf_use_local_clock; +static int __init perf_use_local_clock_setup(char *__unused) +{ + perf_use_local_clock = true; + return 1; +} +__setup("perf_use_local_clock", perf_use_local_clock_setup); + +static int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC; + +static struct ctl_table perf_sample_time_kern_table[] = { + { + .procname = "perf_sample_time_clk_id", + .data = &sysctl_perf_sample_time_clk_id, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = proc_dointvec, + }, + {} +}; + +static struct ctl_table perf_sample_time_root_table[] = { + { + .procname = "kernel", + .mode = 0555, + .child = perf_sample_time_kern_table, + }, + {} +}; + static inline u64 perf_clock(void) { + if (likely(!perf_use_local_clock)) + return ktime_get_mono_fast_ns(); + return local_clock(); } @@ -8230,6 +8264,9 @@ void __init perf_event_init(void) */ BUILD_BUG_ON((offsetof(struct perf_event_mmap_page, data_head)) != 1024); + + if (!perf_use_local_clock) + register_sysctl_table(perf_sample_time_root_table); } static int __init perf_event_sysfs_init(void)
Until now, perf framework never defined the meaning of the timestamps captured as PERF_SAMPLE_TIME sample type. The values were obtaining from local (sched) clock, which is unavailable in userspace. This made it impossible to correlate perf data with any other events. Other tracing solutions have the source configurable (ftrace) or just share a common time domain between kernel and userspace (LTTng). Follow the trend by using monotonic clock, which is readily available as POSIX CLOCK_MONOTONIC. Also add a sysctl "perf_sample_time_clk_id" attribute which can be used by the user to obtain the clk_id to be used with POSIX clock API (eg. clock_gettime()) to obtain a time value comparable with perf samples. Old behaviour can be restored by using "perf_use_local_clock" kernel parameter. Signed-off-by: Pawel Moll <pawel.moll@arm.com> --- Ingo, I remember your comments about this approach in the past, but during discussions at LPC Thomas was convinced that it's the right thing to do - see cover letter for the series... Changes since v3: - Added "perf_use_lock_clock" parameter... - ... and creating the sysctl value only when it's not defined (turned out that negative clk_ids are not invalid - they are used to describe dynamic clocks) - Had to keep sysctl_perf_sample_time_clk_id non-const, because struct ctl_table.data is non-const Documentation/kernel-parameters.txt | 9 +++++++++ kernel/events/core.c | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+)