From patchwork Fri Aug 7 21:29:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Adams X-Patchwork-Id: 262642 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.6 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99491C433E0 for ; Fri, 7 Aug 2020 21:29:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5CEB62224D for ; Fri, 7 Aug 2020 21:29:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Bngt9Nvk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727796AbgHGV3o (ORCPT ); Fri, 7 Aug 2020 17:29:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727077AbgHGV3h (ORCPT ); Fri, 7 Aug 2020 17:29:37 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15577C061756 for ; Fri, 7 Aug 2020 14:29:37 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id r1so4457011ybg.4 for ; Fri, 07 Aug 2020 14:29:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=DZ1XcM0cED77A7RVE0g/4nios57x9MDZZ5ZxcwQdk5E=; b=Bngt9NvknU55LdIMiX02eLN0o6LUzdbtvKXpG+3i0enAKsG8Pk6E/ngdI7IfsJSmDs WRdoM/DYrGwjDP5fkQPscla5JCIplKStX7e0zktn6xNUNqd/cHHVqPkZnjkkULa7jAbM Ezv/VFLfZAmXLRHOZ+Q/VY4w/7BCOcOISosmV7RA4F7egwzg4noTxcKuJFREVEmtJSIX giddv8Mr0Sa2EMpEhiS4+oj9wmTRUSkhnDgnK2GWqKk1v1Q7eJXJ3cRMIHdDOfcIhog1 8X2s/Ep8pLvxSyFsvZXx3w4taCawiNwIU1j19lSPuP7GM/FZ0vE90yzQUhn077tXy8Sx wf5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=DZ1XcM0cED77A7RVE0g/4nios57x9MDZZ5ZxcwQdk5E=; b=qoVvmlV57/XILGNaEd+pvkO+jRi2cFyYTPPNU9UHB/8Z9r4VKXzcTKpsHv0mtzSgiz 5wVyuHSCUG3mTArisOeMNhQ49ru5Pi/+WpHW89Y/7uffTksyoSo3Mu+hbTk9RbB12Mxm IxiTpiKVIkTwYiFoGiOcQ4eNsBdESQhGx3QScjIDxNb1gH710OgewRvS5RtTIT4Bku3g XLLrU1Cf978eq8H/PmqLTRy1OC2sa3SbavDSuwOy4+9N5dItEqdq8yH2jS9j+myn+8/Y jizKcdMRD+Ymhi3LErOVGnWXOKeZo27Q3KyDEK/v1pBtfPq/lc5LlhBI/z0cE+cHrBZ+ 3mnA== X-Gm-Message-State: AOAM5338urcztdigWyJ4xd0awzTyylv3QoXP8+xVHcHwAIk/pk0xOdyN HtNAdm8V3eJg5OcSTHloCrTSJriueYU= X-Google-Smtp-Source: ABdhPJzRbMgMO+xhrxQ92IW1rQZZD11iWBKacVFoF1vhmUHoAFABKi3OC7+XbcCrftSj+4u2ofvEujECnYpN X-Received: by 2002:a5b:5c5:: with SMTP id w5mr22848625ybp.102.1596835776257; Fri, 07 Aug 2020 14:29:36 -0700 (PDT) Date: Fri, 7 Aug 2020 14:29:10 -0700 In-Reply-To: <20200807212916.2883031-1-jwadams@google.com> Message-Id: <20200807212916.2883031-2-jwadams@google.com> Mime-Version: 1.0 References: <20200807212916.2883031-1-jwadams@google.com> X-Mailer: git-send-email 2.28.0.236.gb10cc79966-goog Subject: [RFC PATCH 1/7] core/metricfs: Create metricfs, standardized files under debugfs. From: Jonathan Adams To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: netdev@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Greg KH , Jim Mattson , David Rientjes , Jonathan Adams , Justin TerAvest Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Justin TerAvest Metricfs is a standardized set of files and directories under debugfs, with a kernel API designed to be simpler than exporting new files under sysfs. Type and field information is reported so that a userspace daemon can easily process the information. The statistics live under debugfs, in a tree rooted at: /sys/kernel/debug/metricfs Each metric is a directory, with four files in it. This patch includes a single "metricfs_presence" metric, whose files look like: /sys/kernel/debug/metricfs: metricfs_presence/annotations DESCRIPTION A\ basic\ presence\ metric. metricfs_presence/fields value int metricfs_presence/values 1 metricfs_presence/version 1 Statistics can have zero, one, or two 'fields', which are keys for the table of metric values. With no fields, you have a simple statistic as above, with one field you have a 1-dimensional table of string -> value, and with two fields you have a 2-dimensional table of {string, string} -> value. When a statistic's 'values' file is opened, we pre-allocate a 64k buffer and call the statistic's callback to fill it with data, truncating if the buffer overflows. Statistic creators can create a hierarchy for their statistics using metricfs_create_subsys(). Signed-off-by: Justin TerAvest [jwadams@google.com: Forward ported to v5.8, cleaned up and modernized code significantly] Signed-off-by: Jonathan Adams --- notes: * To go upstream, this will need documentation and a MAINTAINERS update. * It's not clear what the "version" file is for; it's vestigial and should probably be removed. jwadams@google.com: Forward ported to v5.8, removed some google-isms and cleaned up some anachronisms (atomic->refcount, moving to kvmalloc(), using POISON_POINTER_DELTA, made more functions static, made 'emitter_fn' into an explicit union instead of a void *), renamed 'struct emitter -> metric_emitter' and renamed some funcs for consistency. --- include/linux/metricfs.h | 103 ++++++ kernel/Makefile | 2 + kernel/metricfs.c | 727 +++++++++++++++++++++++++++++++++++++ kernel/metricfs_examples.c | 151 ++++++++ lib/Kconfig.debug | 18 + 5 files changed, 1001 insertions(+) create mode 100644 include/linux/metricfs.h create mode 100644 kernel/metricfs.c create mode 100644 kernel/metricfs_examples.c diff --git a/include/linux/metricfs.h b/include/linux/metricfs.h new file mode 100644 index 000000000000..65a1baa8e8c1 --- /dev/null +++ b/include/linux/metricfs.h @@ -0,0 +1,103 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _METRICFS_H_ +#define _METRICFS_H_ + +#include +#include +#include + +struct metric; +struct metricfs_subsys; + +#define METRIC_EXPORT_GENERIC(name, desc, fname0, fname1, fn, is_str, cumulative) \ +static struct metric *metric_##name; \ +void metric_init_##name(struct metricfs_subsys *parent) \ +{ \ + metric_##name = metric_register(__stringify(name), (parent), (desc), \ + (fname0), (fname1), (fn), (is_str), \ + (cumulative), THIS_MODULE); \ +} \ +void metric_exit_##name(void) \ +{ \ + metric_unregister(metric_##name); \ +} + +/* + * Metricfs only deals with two types: int64_t and const char*. + * + * If a metric has fewer than two fields, pass NULL for the field name + * arguments. + * + * The metric does not take ownership of any of the strings passed in. + * + * See kernel/metricfs_examples.c for a set of example metrics, with + * corresponding output. + * + * METRIC_EXPORT_INT - An integer-valued metric. + * METRIC_EXPORT_COUNTER - An integer-valued cumulative metric. + * METRIC_EXPORT_STR - A string-valued metric. + */ +#define METRIC_EXPORT_INT(name, desc, fname0, fname1, fn) \ + METRIC_EXPORT_GENERIC(name, (desc), (fname0), (fname1), (fn), \ + false, false) +#define METRIC_EXPORT_COUNTER(name, desc, fname0, fname1, fn) \ + METRIC_EXPORT_GENERIC(name, (desc), (fname0), (fname1), (fn), \ + false, true) +#define METRIC_EXPORT_STR(name, desc, fname0, fname1, fn) \ + METRIC_EXPORT_GENERIC(name, (desc), (fname0), (fname1), (fn), \ + true, false) + +/* Subsystem support. */ +/* Pass NULL as 'parent' to create a new top-level subsystem. */ +struct metricfs_subsys *metricfs_create_subsys(const char *name, + struct metricfs_subsys *parent); +void metricfs_destroy_subsys(struct metricfs_subsys *d); + +/* + * An opaque struct that metric emit functions use to keep our internal + * state. + */ +struct metric_emitter; + +/* The number of non-NULL arguments passed to EMIT macros must match the number + * of arguments passed to the EXPORT macro for a given metric. + * + * Failure to do so will cause data to be mangled (or dropped) by userspace or + * Monarch. + */ +#define METRIC_EMIT_INT(e, v, f0, f1) \ + metric_emit_int_value((e), (v), (f0), (f1)) +#define METRIC_EMIT_STR(e, v, f0, f1) \ + metric_emit_str_value((e), (v), (f0), (f1)) + +/* Users don't have to call any functions below; + * use the macro definitions above instead. + */ +void metric_emit_int_value(struct metric_emitter *e, + int64_t v, const char *f0, const char *f1); +void metric_emit_str_value(struct metric_emitter *e, + const char *v, const char *f0, const char *f1); + +struct metric *metric_register(const char *name, + struct metricfs_subsys *parent, + const char *description, + const char *fname0, const char *fname1, + void (*fn)(struct metric_emitter *e), + bool is_string, + bool is_cumulative, + struct module *owner); + +struct metric *metric_register_parm(const char *name, + struct metricfs_subsys *parent, + const char *description, + const char *fname0, const char *fname1, + void (*fn)(struct metric_emitter *e, + void *parm), + void *parm, + bool is_string, + bool is_cumulative, + struct module *owner); + +void metric_unregister(struct metric *m); + +#endif /* _METRICFS_H_ */ diff --git a/kernel/Makefile b/kernel/Makefile index f3218bc5ec69..0edf790935b0 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -109,6 +109,8 @@ obj-$(CONFIG_CPU_PM) += cpu_pm.o obj-$(CONFIG_BPF) += bpf/ obj-$(CONFIG_KCSAN) += kcsan/ obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o +obj-$(CONFIG_METRICFS) += metricfs.o +obj-$(CONFIG_METRICFS_EXAMPLES) += metricfs_examples.o obj-$(CONFIG_PERF_EVENTS) += events/ diff --git a/kernel/metricfs.c b/kernel/metricfs.c new file mode 100644 index 000000000000..676b7b04aa2b --- /dev/null +++ b/kernel/metricfs.c @@ -0,0 +1,727 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Metricfs: A mechanism for exporting metrics from the kernel. + * + * Kernel code must provide: + * - A description of the metric + * - The subsystem for the metric (NULL is ok) + * - Type information about the metric, and + * - A callback function which supplies metric values. + * + * In return, metricfs provides files in debugfs at: + * /sys/kernel/debug/metricfs/// + * The files are: + * - annotations, which provides streamz "annotations"-- the description, and + * other metadata (e.g. if it's constant, deprecated, etc.) + * - fields, which provides type information about the metric and its fields. + * - values, which contains the actual metric value data. + * - version, which is kept around for future-proofing. + * + * Metrics only support a limited subset of types-- for fields, they only + * support strings, integers, and boolean types. For simplicity, we only support + * strings and integers and strictly control how the data is formatted when + * displayed from debugfs. + * + * See kernel/metricfs_examples.c for example code. + * + * Limitations: + * - "values" files are at MOST 64K. We truncate the file at that point. + * - The list of fields and types is at most 1K. + * - Metrics may have at most 2 fields. + * + * Best Practices: + * - Emit the most important data first! Once the 64K per-metric buffer + * is full, the emit* functions won't do anything. + * - In userspace, open(), read(), and close() the file quickly! The kernel + * allocation for the metric is alive as long as the file is open. This + * permits users to seek around the contents of the file, while permitting + * an atomic view of the data. + * + * FAQ: + * - Why is memory allocated for file data at open()? + * Snapshots of data provided by the kernel should be as "atomic" as + * possible. If userspace code performs read()s smaller than the total + * amount of data, we'd like for that tool to still work, while providing a + * consistent view of the file. + * + * Questions: + * - Would it be simpler if we escaped spaces instead of wrapping strings in + * quotes? + */ +struct metric { + const char *name; + const char *description; + + /* Metric field names (optional, NULL if unused) */ + const char *fname0; + const char *fname1; + + union { + void (*emit_noparm)(struct metric_emitter *e); /* !has_parm */ + void (*emit_parm)(struct metric_emitter *e, + void *parm); /* has_parm */ + } emit_fn; + void *eparm; + bool is_string; + bool is_cumulative; + bool has_parm; + + /* dentry for the directory that contains the metric */ + struct dentry *dentry; + + struct module *owner; + + refcount_t refcnt; + + /* Inodes that have references to our metric, protected under + * big_mutex. + */ + struct inode *inodes[4]; +}; + +/* Returns true if the refcount was successfully incremented for the metric */ +static int metric_module_get(struct metric *m) +{ + if (!try_module_get(m->owner)) + return 0; + + if (!refcount_inc_not_zero(&m->refcnt)) { + module_put(m->owner); + return 0; + } + + return 1; +} + +/* Returns true if the last reference was put. */ +static bool metric_put(struct metric *m) +{ + bool rc = refcount_dec_and_test(&m->refcnt); + + if (rc) + kfree(m); + return rc; +} + +static void metric_module_put(struct metric *m) +{ + struct module *owner = m->owner; + + metric_put(m); + module_put(owner); +} + +struct metric_emitter { + char *buf; + char *orig_buf; /* To calculate total written. */ + int size; /* Size of underlying buffer. */ + struct metric *metric; /* For type checking. */ +}; + +#define METRICFS_ANNOTATIONS_BUF_SIZE (1 * 1024) +#define METRICFS_FIELDS_BUF_SIZE (1 * 1024) +#define METRICFS_VALUES_BUF_SIZE (64 * 1024) +#define METRICFS_VERSION_BUF_SIZE (8) + +/* Maximum length for fields. They're truncated at this point. */ +#define METRICFS_MAX_FIELD_LEN (100) + +static int emit_bytes_left(const struct metric_emitter *e) +{ + WARN_ON(e->orig_buf > e->buf); + return e->size - (e->buf - e->orig_buf); +} + +struct char_tracker { + char *dest; + int size; + int pos; +}; + +static void add_char(struct char_tracker *t, char c) +{ + if (t->pos < t->size) + t->dest[t->pos] = c; + /* Increment pos even if we don't print, so we know how many + * characters we'd print if we had room. + */ + t->pos++; +} + +/* Escape backslashes, spaces, and newlines in string "s", + * copying to "dest", to a maximum of "size" characters. + * + * examples: + * [Hi\ , "there"] -> [Hi\\\ ,\ "there"] + * [foo + * bar] - > [foo\nbar] + * + * Returns the number of characters that would be copied, if enough space + * was available. Doesn't emit a trailing zero. + */ +static int escape_string(char *dest, const char *s, int size) +{ + struct char_tracker tracker = { + .dest = dest, + .size = size, + .pos = 0, + }; + + /* We have to process the entire source string to ensure that + * we return a useful value for the total possible emitted length. + */ + while (*s != 0) { + /* escape newlines */ + if (*s == '\n') { + add_char(&tracker, '\\'); + add_char(&tracker, 'n'); + s++; + continue; + } + + /* escape spaces and backslashes. */ + if (*s == ' ' || *s == '\\') + add_char(&tracker, '\\'); + add_char(&tracker, *s); + s++; + } + + return tracker.pos; +} + +/* Emits a string into the emitter buffer, no escaping */ +static bool emit_string(struct metric_emitter *e, const char *s) +{ + int bytes_left = emit_bytes_left(e); + int rc = snprintf(e->buf, bytes_left, "%s", s); + + e->buf += min(rc, bytes_left); + return rc < bytes_left; +} + +/* Emits a string into the emitter buffer, escaping quotes and newlines. */ +static bool emit_quoted_string(struct metric_emitter *e, const char *s) +{ + int bytes_left = emit_bytes_left(e); + int rc = escape_string(e->buf, s, bytes_left); + + e->buf += min(rc, bytes_left); + return rc < bytes_left; +} + +/* Emits an int into the emitter buffer */ +static bool emit_int(struct metric_emitter *e, int64_t i) +{ + int bytes_left = emit_bytes_left(e); + int rc = snprintf(e->buf, bytes_left, "%lld", i); + + e->buf += min(rc, bytes_left); + return rc < bytes_left; +} + +static void check_field_mismatch(struct metric *m, const char *f0, + const char *f1) +{ + WARN_ON(m->fname0 && !f0); + WARN_ON(!m->fname0 && f0); + WARN_ON(m->fname1 && !f1); + WARN_ON(!m->fname1 && f1); +} + +void metric_emit_int_value(struct metric_emitter *e, int64_t v, + const char *f0, const char *f1) +{ + char *ckpt = e->buf; + bool ok = true; + + WARN_ON_ONCE(e->metric->is_string); + check_field_mismatch(e->metric, f0, f1); + if (f0) { + ok &= emit_quoted_string(e, f0); + ok &= emit_string(e, " "); + if (f1) { + ok &= emit_quoted_string(e, f1); + ok &= emit_string(e, " "); + } + } + ok &= emit_int(e, v); + ok &= emit_string(e, "\n"); + if (!ok) + e->buf = ckpt; +} +EXPORT_SYMBOL(metric_emit_int_value); + +void metric_emit_str_value(struct metric_emitter *e, const char *v, + const char *f0, const char *f1) +{ + char *ckpt = e->buf; + bool ok = true; + + WARN_ON_ONCE(!e->metric->is_string); + check_field_mismatch(e->metric, f0, f1); + if (f0) { + ok &= emit_quoted_string(e, f0); + ok &= emit_string(e, " "); + if (f1) { + ok &= emit_quoted_string(e, f1); + ok &= emit_string(e, " "); + } + } + ok &= emit_quoted_string(e, v); + ok &= emit_string(e, "\n"); + if (!ok) + e->buf = ckpt; +} +EXPORT_SYMBOL(metric_emit_str_value); + +/* Contains file data generated at open() */ +struct metricfs_file_private { + size_t bytes_written; + char buf[0]; +}; + +/* A mutex to prevent races involving the pointer to the inode stored in + * inode->i_private. We'll remove this if we can get a callback at inode + * deletion in debugfs. + */ +static DEFINE_MUTEX(big_mutex); + +/* Returns 1 on success, <0 otherwise. */ +static int metric_open_helper(struct inode *inode, struct file *filp, + int buf_size, + struct metric **m, + struct metricfs_file_private **p) +{ + int size; + + mutex_lock(&big_mutex); + /* Debugfs stores the "data" parameter from debugfs_create_file in + * inode->i_private. + */ + *m = (struct metric *)inode->i_private; + if (!(*m) || !metric_module_get(*m)) { + mutex_unlock(&big_mutex); + return -ENXIO; + } + mutex_unlock(&big_mutex); + + size = sizeof(struct metricfs_file_private) + buf_size; + *p = kvmalloc(size, GFP_KERNEL); + if (!*p) { + metric_module_put(*m); + return -ENOMEM; + } + filp->private_data = *p; + return 1; +} + +static int metricfs_generic_release(struct inode *inode, struct file *filp) +{ + struct metricfs_file_private *p = + (struct metricfs_file_private *)filp->private_data; + kvfree(p); + + filp->private_data = (void *)(0xDEADBEEFul + POISON_POINTER_DELTA); + /* FIXME here too? */ + metric_module_put((struct metric *)inode->i_private); + return 0; +} + +static int metricfs_annotations_open(struct inode *inode, struct file *filp) +{ + struct metric_emitter e; + struct metric *m; + struct metricfs_file_private *p; + bool ok = true; + + int rc = metric_open_helper(inode, filp, METRICFS_ANNOTATIONS_BUF_SIZE, + &m, &p); + if (rc < 0) + return rc; + + e.buf = p->buf; + e.orig_buf = p->buf; + e.size = METRICFS_ANNOTATIONS_BUF_SIZE; + ok &= emit_string(&e, "DESCRIPTION "); + ok &= emit_quoted_string(&e, m->description); + ok &= emit_string(&e, "\n"); + if (m->is_cumulative) + ok &= emit_string(&e, "CUMULATIVE\n"); + + /* Emit all or nothing. */ + if (ok) { + p->bytes_written = e.buf - e.orig_buf; + } else { + metricfs_generic_release(inode, filp); + return -ENOMEM; + } + + return 0; +} + +static int metricfs_fields_open(struct inode *inode, struct file *filp) +{ + struct metric_emitter e; + struct metric *m; + struct metricfs_file_private *p; + bool ok = true; + + int rc = metric_open_helper(inode, filp, METRICFS_FIELDS_BUF_SIZE, + &m, &p); + if (rc < 0) + return rc; + + e.buf = p->buf; + e.orig_buf = p->buf; + e.size = METRICFS_FIELDS_BUF_SIZE; + e.metric = m; + + /* We don't have to do string escaping on fields, as quotes aren't + * permitted in field names. + */ + if (m->fname0) { + ok &= emit_string(&e, m->fname0); + ok &= emit_string(&e, " "); + } + if (m->fname1) { + ok &= emit_string(&e, m->fname1); + ok &= emit_string(&e, " "); + } + ok &= emit_string(&e, "value\n"); + + if (m->fname0) + ok &= emit_string(&e, "str "); + if (m->fname1) + ok &= emit_string(&e, "str "); + ok &= emit_string(&e, (m->is_string) ? "str\n" : "int\n"); + + /* Emit all or nothing. */ + if (ok) { + p->bytes_written = e.buf - e.orig_buf; + } else { + metricfs_generic_release(inode, filp); + return -ENOMEM; + } + + return 0; +} + +static int metricfs_version_open(struct inode *inode, struct file *filp) +{ + struct metric *m; + struct metricfs_file_private *p; + int rc = metric_open_helper(inode, filp, METRICFS_VERSION_BUF_SIZE, + &m, &p); + if (rc < 0) + return rc; + + p->bytes_written = snprintf(p->buf, METRICFS_VERSION_BUF_SIZE, + "1\n"); + + if (p->bytes_written >= METRICFS_VERSION_BUF_SIZE) { + metricfs_generic_release(inode, filp); + return -ENOMEM; + } + + return 0; +} + +static int metricfs_values_open(struct inode *inode, struct file *filp) +{ + struct metric_emitter e; + + struct metric *m; + struct metricfs_file_private *p; + int rc = metric_open_helper(inode, filp, METRICFS_VALUES_BUF_SIZE, + &m, &p); + if (rc < 0) + return rc; + + e.buf = p->buf; + e.orig_buf = p->buf; + e.size = METRICFS_VALUES_BUF_SIZE; + e.metric = m; + + if (m->has_parm) { + if (m->emit_fn.emit_parm) + (m->emit_fn.emit_parm)(&e, m->eparm); + } else { + if (m->emit_fn.emit_noparm) + (m->emit_fn.emit_noparm)(&e); + } + p->bytes_written = e.buf - e.orig_buf; + return 0; +} + +static ssize_t metricfs_generic_read(struct file *filp, char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + struct metricfs_file_private *p = + (struct metricfs_file_private *)filp->private_data; + return simple_read_from_buffer(ubuf, cnt, ppos, p->buf, + p->bytes_written); +} + +static const struct file_operations metricfs_annotations_ops = { + .open = metricfs_annotations_open, + .read = metricfs_generic_read, + .release = metricfs_generic_release, +}; + +static const struct file_operations metricfs_fields_ops = { + .open = metricfs_fields_open, + .read = metricfs_generic_read, + .release = metricfs_generic_release, +}; + +static const struct file_operations metricfs_values_ops = { + .open = metricfs_values_open, + .read = metricfs_generic_read, + .release = metricfs_generic_release, +}; + +static const struct file_operations metricfs_version_ops = { + .open = metricfs_version_open, + .read = metricfs_generic_read, + .release = metricfs_generic_release, +}; + +static struct dentry *d_metricfs; + +static struct dentry *metricfs_init_dentry(void) +{ + static int once; + + if (d_metricfs) + return d_metricfs; + + if (!debugfs_initialized()) + return NULL; + + d_metricfs = debugfs_create_dir("metricfs", NULL); + + if (!d_metricfs && !once) { + once = 1; + pr_warn("Could not create debugfs directory 'metricfs'\n"); + return NULL; + } + + return d_metricfs; +} + +/* We always cast in and out to struct dentry. */ +struct metricfs_subsys { + struct dentry dentry; +}; + +static struct dentry *metricfs_create_file(const char *name, + mode_t mode, + struct dentry *parent, + void *data, + const struct file_operations *fops) +{ + struct dentry *ret; + + ret = debugfs_create_file(name, mode, parent, data, fops); + if (!ret) + pr_warn("Could not create debugfs '%s' entry\n", name); + + return ret; +} + +static struct dentry *metricfs_create_dir(const char *name, + struct metricfs_subsys *s) +{ + struct dentry *d; + + if (!s) + d = d_metricfs; + else + d = &s->dentry; + + if (!d) { + pr_warn("Couldn't create %s, subsys doesn't exist.", name); + return NULL; + } + return debugfs_create_dir(name, d); +} + +static int metricfs_initialized; + +struct metric *metric_register(const char *name, + struct metricfs_subsys *parent, + const char *description, + const char *fname0, + const char *fname1, + void (*fn)(struct metric_emitter *e), + bool is_string, + bool is_cumulative, + struct module *owner) +{ + struct metric *m; + struct dentry *d, *t; + + if (!metricfs_initialized) { + pr_warn("Could not create metric before initing metricfs\n"); + return NULL; + } + + m = kzalloc(sizeof(*m), GFP_KERNEL); + if (!m) + return NULL; + + d = metricfs_create_dir(name, parent); + if (!d) { + pr_warn("Could not create dir '%s' in metricfs.\n", name); + kfree(m); + return NULL; + } + + m->description = description; + m->fname0 = fname0; + m->fname1 = fname1; + m->has_parm = false; + m->emit_fn.emit_noparm = fn; + m->eparm = NULL; + m->is_string = is_string; + m->is_cumulative = is_cumulative; + refcount_set(&m->refcnt, 1); + m->owner = owner; + m->dentry = d; + + + mutex_lock(&big_mutex); + t = metricfs_create_file("annotations", 0444, d, m, + &metricfs_annotations_ops); + if (!t) + goto done; + m->inodes[0] = t->d_inode; + + t = metricfs_create_file("fields", 0444, d, m, + &metricfs_fields_ops); + if (!t) + goto done; + m->inodes[1] = t->d_inode; + + t = metricfs_create_file("values", 0444, d, m, + &metricfs_values_ops); + if (!t) + goto done; + m->inodes[2] = t->d_inode; + + t = metricfs_create_file("version", 0444, d, m, + &metricfs_version_ops); + if (!t) + goto done; + m->inodes[3] = t->d_inode; + +done: + /* Unregister the metric before anyone calls open() if we had any + * errors on file creation. + */ + if (!t) { + metric_unregister(m); + m = NULL; + } + mutex_unlock(&big_mutex); + + return m; +} +EXPORT_SYMBOL(metric_register); + +struct metric *metric_register_parm(const char *name, + struct metricfs_subsys *parent, + const char *description, + const char *fname0, + const char *fname1, + void (*fn)(struct metric_emitter *e, + void *parm), + void *eparm, + bool is_string, + bool is_cumulative, + struct module *owner) +{ + struct metric *metric = + metric_register(name, parent, description, + fname0, fname1, + (void (*)(struct metric_emitter *))NULL, + is_string, + is_cumulative, owner); + if (metric) { + metric->has_parm = true; + metric->emit_fn.emit_parm = fn; + metric->eparm = eparm; + } + return metric; +} +EXPORT_SYMBOL(metric_register_parm); + +void metric_unregister(struct metric *m) +{ + /* We have to NULL out the i_private pointers here so that no other + * callers come into open, getting a pointer to the metric that we + * freed. + */ + mutex_lock(&big_mutex); + m->inodes[0]->i_private = NULL; + m->inodes[1]->i_private = NULL; + m->inodes[2]->i_private = NULL; + m->inodes[3]->i_private = NULL; + mutex_unlock(&big_mutex); + + debugfs_remove_recursive(m->dentry); + metric_put(m); +} +EXPORT_SYMBOL(metric_unregister); + +struct metricfs_subsys *metricfs_create_subsys(const char *name, + struct metricfs_subsys *parent) +{ + struct dentry *d = metricfs_create_dir(name, parent); + + return container_of(d, struct metricfs_subsys, dentry); +} +EXPORT_SYMBOL(metricfs_create_subsys); + +void metricfs_destroy_subsys(struct metricfs_subsys *s) +{ + if (s) + debugfs_remove(&s->dentry); +} +EXPORT_SYMBOL(metricfs_destroy_subsys); + +static void metricfs_presence_fn(struct metric_emitter *e) +{ + METRIC_EMIT_INT(e, 1, NULL, NULL); +} +METRIC_EXPORT_INT(metricfs_presence, "A basic presence metric.", + NULL, NULL, metricfs_presence_fn); + +static int __init metricfs_init(void) +{ + if (!metricfs_init_dentry()) + return -ENOMEM; + metricfs_initialized = 1; + + /* Create a basic "presence" metric. */ + metric_init_metricfs_presence(NULL); + + mutex_init(&big_mutex); + return 0; +} + +/* + * Debugfs should be fine by the time we're at fs_initcall. + */ +fs_initcall(metricfs_init); diff --git a/kernel/metricfs_examples.c b/kernel/metricfs_examples.c new file mode 100644 index 000000000000..50d891176728 --- /dev/null +++ b/kernel/metricfs_examples.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +/* A metric to force truncation of the values file. "values" files in + * metricfs can be at most 64K in size. It truncates to the last record + * that fits entirely in the output file. + * + * Creates a metric with a values file that looks like: + * val"0" 0 + * val"1" 1 + * val"2" 2 + * ... + * "val"3565" 3565 + */ +static void more_than_64k_fn(struct metric_emitter *e) +{ + char buf[80]; + int i; + + for (i = 0; i < 10000; i++) { + sprintf(buf, "val\"%d\"", i); + /* Argument order is (emitter, value, field0, field1...) */ + METRIC_EMIT_INT(e, i, buf, NULL); + } +} +METRIC_EXPORT_INT(more_than_64k, "Stress test metric.", + "v", NULL, more_than_64k_fn); + + +/* A metric with two string fields and int64 values. + * + * # cat /sys/kernel/debug/metricfs/two_string_fields/annotations + * DESCRIPTION "Two fields example." + * # cat /sys/kernel/debug/metricfs/two_string_fields/fields + * disk cgroup value + * str str int + * # cat /sys/kernel/debug/metricfs/two_string_fields/values + * sda /map_reduce1 0 + * sda /sys 50 + * sdb /map_reduce2 12 + */ +static void two_string_fields_fn(struct metric_emitter *e) +{ +#define NR_ENTRIES 3 + const char *disk[NR_ENTRIES] = {"sda", "sda", "sdb"}; + const char *cgroups[NR_ENTRIES] = { + "/map_reduce1", "/sys", "/map_reduce2"}; + const int64_t counters[NR_ENTRIES] = {0, 50, 12}; + int i; + + for (i = 0; i < NR_ENTRIES; i++) { + METRIC_EMIT_INT(e, + counters[i], disk[i], cgroups[i]); + } +} +#undef NR_ENTRIES +METRIC_EXPORT_INT(two_string_fields, "Two fields example.", + "disk", "cgroup", two_string_fields_fn); + + +/* A metric with zero fields and a string value. + * + * # cat /sys/kernel/debug/metricfs/string_valued_metric/annotations + * DESCRIPTION "String metric." + * # cat /sys/kernel/debug/metricfs/string_valued_metric/fields + * value + * str + * # cat /sys/kernel/debug/metricfs/string_valued_metric/values + * Test\ninfo. + */ +static void string_valued_metric_fn(struct metric_emitter *e) +{ + METRIC_EMIT_STR(e, "Test\ninfo.", NULL, NULL); +} +METRIC_EXPORT_STR(string_valued_metric, "String metric.", + NULL, NULL, string_valued_metric_fn); + +/* Test metric to ensure we behave properly with a large annotation string. */ +static void huge_annotation_fn(struct metric_emitter *e) +{ + METRIC_EMIT_STR(e, "test\n", NULL, NULL); +} +static const char *huge_annotation_s = + "1231231231231231231231231231231241241212895781930750981347503485" + "7029348750923847502384750923847590234857902348759023475028934751" + "1111111111111112312312312312312312312312312312412412128957819307" + "5098134750348570293487509238475023847509238475902348579023487590" + "2347502893475 23123123123123123123123123123124124121289578193075" + "0981347503485702934875092384750238475092384759023485790234875902" + "347502893475 231231231231231231231231231231241241212895781930750" + "9813475034857029348750923847502384750923847590234857902348759023" + "47502893475 2312312312312312312312312312312412412128957819307509" + "8134750348570293487509238475023847509238475902348579023487590234" + "7502893475 23123123123123123123123123123124124121289578193075098" + "1347503485702934875092384750238475092384759023485790234875902347" + "502893475 231231231231231231231231231231241241212895781930750981" + "3475034857029348750923847502384750923847590234857902348759023475" + "02893475 2312312312312312312312312312312412412128957819307509813" + "4750348570293487509238475023847509238475902348579023487590234750" + "2893475 23123123123123123123123123123124124121289578193075098134" + "7503485702934875092384750238475092384759023485790234875902347502" + "893475 231231231231231231231231231231241241212895781930750981347" + "5034857029348750923847502384750923847590234857902348759023475028" + "93475 2312312312312312312312312312312412412128957819307509813475" + "0348570293487509238475023847509238475902348579023487590234750289" + "3475 23123123123123123123123123123124124121289578193075098134750" + "3485702934875092384750238475092384759023485790234875902347502893" + "475 231231231231231231231231231231241241212895781930750981347503" + "4857029348750923847502384750923847590234857902348759023475028934" + "75 2312312312312312312312312312312412412128957819307509813475034" + "8570293487509238475023847509238475902348579023487590234750289347" + "5 23123123123123123123123123123124124121289578193075098134750348" + "5702934875092384750238475092384759023485790234875902347502893475" + " 231231231231231231231231231231241241212895781930750981347503485" + "702934875092384750238475092384759023485790234875902347502893475 " + "2312312312312312312312312312312412412128957819307509813475034857" + "02934875092384750238475092384759023485790234875902347502893475"; + +METRIC_EXPORT_STR(huge_annotation, huge_annotation_s, NULL, NULL, + huge_annotation_fn); + + +struct metricfs_subsys *examples_subsys; + +static int __init metricfs_examples_init(void) +{ + examples_subsys = metricfs_create_subsys("examples", NULL); + metric_init_more_than_64k(examples_subsys); + metric_init_two_string_fields(examples_subsys); + metric_init_string_valued_metric(examples_subsys); + metric_init_huge_annotation(examples_subsys); + + return 0; +} + +static void __exit metricfs_examples_exit(void) +{ + metric_exit_more_than_64k(); + metric_exit_two_string_fields(); + metric_exit_string_valued_metric(); + metric_exit_huge_annotation(); + + metricfs_destroy_subsys(examples_subsys); +} + +module_init(metricfs_examples_init); +module_exit(metricfs_examples_exit); + +MODULE_LICENSE("GPL"); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 9ad9210d70a1..8de0244e7804 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -325,6 +325,24 @@ config READABLE_ASM to keep kernel developers who have to stare a lot at assembler listings sane. +config METRICFS + bool "Metricfs for sysmon" + depends on DEBUG_FS + help + metricfs is a library for creating rigidly-formatted files in debugfs + which can be automatically monitored by user-space telemetry. The + hierarchy is rooted at /sys/kernel/debug/metricfs, and each metric + contains metadata about the metric and types involved, as well as a + tabular values file with the metrics themselves. + +config METRICFS_EXAMPLES + tristate "Metricfs examples" + depends on METRICFS + help + example tests and metrics for metricfs. With this, a set of metrics + appear under "examples", covering various corner cases of the metricfs + interface. These can be used to test the metricfs functionality. + config HEADERS_INSTALL bool "Install uapi headers to usr/include" depends on !UML From patchwork Fri Aug 7 21:29:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Adams X-Patchwork-Id: 262639 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.6 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9F4EC433DF for ; Fri, 7 Aug 2020 21:30:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A45BF22CB3 for ; Fri, 7 Aug 2020 21:30:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="FbP0RxUp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727986AbgHGVar (ORCPT ); Fri, 7 Aug 2020 17:30:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727101AbgHGV3l (ORCPT ); Fri, 7 Aug 2020 17:29:41 -0400 Received: from mail-qk1-x74a.google.com (mail-qk1-x74a.google.com [IPv6:2607:f8b0:4864:20::74a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63928C061757 for ; Fri, 7 Aug 2020 14:29:41 -0700 (PDT) Received: by mail-qk1-x74a.google.com with SMTP id v188so2386371qkb.17 for ; Fri, 07 Aug 2020 14:29:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Xwhl6oTAS7prpbXCRZdZshg5xz57JrEbZ056Cu9NiLQ=; b=FbP0RxUpuIXc9TPHeycYqgrsWHy+edJZOunO3xzH9xgd3XMesTsydOHKMec+qxUJFA FlXnrJi21QCaDcqgirJGJT0eBkQ0v3Vq6LIzdTyTYf6I4CV8iPofNCGuawph3tfsMnfh rVST9SGjyClH8eIIJqualzLP+dJgXyLWWuF3VRTq6ejnpWajSiynlSZz5Y285bh+tnSW ZxEvaLhRHfrOvoRGHyx1uzFMrik/lFM8Yi0f5EOQG7sM80OuBBVAKF1eI2Nt5evM5J/V ZSfJtPKRCyz4Ary7BmVgPJgTRfuuh+/bHoAJlAFzxb/0O1Zu489t0ofM9yH0JSH6UL05 9KPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Xwhl6oTAS7prpbXCRZdZshg5xz57JrEbZ056Cu9NiLQ=; b=it+aY2LuIPkLR+Ckefq5gldtoUI2G7t8HeJcRF1wI7/CJO7VHrTEqa/x8pbipTWPoe ngcfVhVrZgieiyJ1Kv2O7Mg1ii95Dh/MQH2S5INEx3AdivmFylD/joXwNlcYSHdcRaLx NfkwoLY5NImzwoGh01+MlY9v9m/GZXfhCR0sYMbn+9EMolvt4Azb+Fn5/68wpWyuCJ2C xPopzrk4YBgJ+hMfn11amseLtpAfWZXAyBo/JTYYrY4qy8nIXw42xcfpp9QonCon8hJs M2OC3tUw2TPRNGFeZ28U8VC+XZ3hGSklxyS4o9PmPYMjgxhFnNZstjC4g+Ek9lA5+Cyi +j0Q== X-Gm-Message-State: AOAM533K6/53xsrTssshrjkmTWZ5dScxpKB3uJ4/7DoTomWvW6VGr91z SLA8tU1CMA8l9+6XrkddYbqC6X7HvFw= X-Google-Smtp-Source: ABdhPJyLLX7R17IOCOR3CY0ZoHgSxM9yzJvgcbYl6HaF+KCt80M13Jn4ki0ErUlYpRFS5LWuyydOwJIuc87a X-Received: by 2002:ad4:5502:: with SMTP id az2mr16402089qvb.148.1596835779896; Fri, 07 Aug 2020 14:29:39 -0700 (PDT) Date: Fri, 7 Aug 2020 14:29:12 -0700 In-Reply-To: <20200807212916.2883031-1-jwadams@google.com> Message-Id: <20200807212916.2883031-4-jwadams@google.com> Mime-Version: 1.0 References: <20200807212916.2883031-1-jwadams@google.com> X-Mailer: git-send-email 2.28.0.236.gb10cc79966-goog Subject: [RFC PATCH 3/7] core/metricfs: metric for kernel warnings From: Jonathan Adams To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: netdev@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Greg KH , Jim Mattson , David Rientjes , Jonathan Adams Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Count kernel warnings by function name of the caller. Each time WARN() is called, which includes WARN_ON(), increment a counter in a 256-entry hash table. The table key is the entry point of the calling function, which is found using kallsyms. We store the name of the function in the table (because it may be a module address); reporting the metric just walks the table and prints the values. The "warnings" metric is cumulative. Signed-off-by: Jonathan Adams --- jwadams@google.com: rebased to 5.8-rc6, removed google-isms, added lockdep_assert_held(), NMI handling, ..._unknown*_counts and locking in warn_tbl_fn(); renamed warn_metric... to warn_tbl... The original work was done in 2012 by an engineer no longer at Google. --- kernel/panic.c | 131 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 131 insertions(+) diff --git a/kernel/panic.c b/kernel/panic.c index e2157ca387c8..c019b41ab387 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -31,6 +31,9 @@ #include #include #include +#include +#include +#include #include #define PANIC_TIMER_STEP 100 @@ -568,6 +571,133 @@ void oops_exit(void) kmsg_dump(KMSG_DUMP_OOPS); } +#ifdef CONFIG_METRICFS + +/* + * Hash table from function address to count of WARNs called within that + * function. + * So far this is an add-only hash table (ie, entries never removed), so some + * simplifying assumptions are made. + */ +#define WARN_TBL_BITS (8) +#define WARN_TBL_SIZE (1<= 0) + warn_tbl[entry].count++; + else + warn_tbl_unknown_count++; + + spin_unlock_irqrestore(&warn_tbl_lock, flags); +} + +/* + * Export the hash table to metricfs. + */ +static void warn_tbl_fn(struct metric_emitter *e) +{ + int i; + unsigned long flags; + int unknown_count = READ_ONCE(warn_tbl_unknown_count) + + atomic_read(&warn_tbl_unknown_nmi_count) + + atomic_read(&warn_tbl_unknown_lookup_count); + + if (unknown_count != 0) + METRIC_EMIT_INT(e, unknown_count, "(unknown)", NULL); + + spin_lock_irqsave(&warn_tbl_lock, flags); + for (i = 0; i < WARN_TBL_SIZE; i++) { + unsigned long fn = (unsigned long)warn_tbl[i].function; + const char *function_name = warn_tbl[i].function_name; + int count = warn_tbl[i].count; + + if (!fn) + continue; + + // function_name[] is constant once function is non-NULL + spin_unlock_irqrestore(&warn_tbl_lock, flags); + METRIC_EMIT_INT(e, count, function_name, NULL); + spin_lock_irqsave(&warn_tbl_lock, flags); + } + spin_unlock_irqrestore(&warn_tbl_lock, flags); +} +METRIC_EXPORT_COUNTER(warnings, "Count of calls to WARN().", + "function", NULL, warn_tbl_fn); + +static int __init metricfs_panic_init(void) +{ + metric_init_warnings(NULL); + return 0; +} +late_initcall(metricfs_panic_init); + +#else /* CONFIG_METRICFS */ +inline void tbl_increment(void *caller) {} +#endif + struct warn_args { const char *fmt; va_list args; @@ -576,6 +706,7 @@ struct warn_args { void __warn(const char *file, int line, void *caller, unsigned taint, struct pt_regs *regs, struct warn_args *args) { + tbl_increment(caller); disable_trace_on_warning(); if (file) From patchwork Fri Aug 7 21:29:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Adams X-Patchwork-Id: 262641 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.6 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB00BC433E0 for ; Fri, 7 Aug 2020 21:30:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D7802224D for ; Fri, 7 Aug 2020 21:30:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wF2sKYSC" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727803AbgHGV37 (ORCPT ); Fri, 7 Aug 2020 17:29:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727114AbgHGV3m (ORCPT ); Fri, 7 Aug 2020 17:29:42 -0400 Received: from mail-qt1-x84a.google.com (mail-qt1-x84a.google.com [IPv6:2607:f8b0:4864:20::84a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACFC8C061A29 for ; Fri, 7 Aug 2020 14:29:42 -0700 (PDT) Received: by mail-qt1-x84a.google.com with SMTP id e14so2620461qtm.5 for ; Fri, 07 Aug 2020 14:29:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=O7nU7lYRfe2R9ziGF333WHlsyARaR/Q4/HWduypodHg=; b=wF2sKYSCPfS1oM0NH0xBupgB1N/FqdnAiRnprzF9CglUuoj8PFS0W/tP4/W0GJPNuX TyrQaJi9YYWikcZeI0piuRjIgi7S5jtC+iCUMgx8SN14rfqmKY7koAozdgACk/O4WfVJ yv7+JzWcCmipUuyAMJMfzjLlkCoyIjH1rZVL2QfDcg/SnWLNI8ygMAJbmEQc4uzHhanr cOazUENtE+ArWoT9JNvMKb2ZhVEp8J8eXYoSQYPithEBghTam0jogYR9DsEv5IJ8i0zG nhl2W24qoGsirdUwqm7iXnLP0F2QVNElE3cyh5qkdYDPSSJRi85rrFJtgQ3oy1s4p4h+ sSCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=O7nU7lYRfe2R9ziGF333WHlsyARaR/Q4/HWduypodHg=; b=LbsqjSlgdRuJHZ+GdrZrVuLulxSWneJBaWeXE38WgrueYlG06VtCGdENvtKQvb/pw7 md94sIk+/E0TGwHbKCnB1+ArMLaVxovKR/7AavpFDu9I5kxGLjfVAJgeQtX7+qTssIKy b1MVISXq//FAfBZZ+uRb1QeQTfil4PdDJRHo03+qY/9iCeP9ROEAv3g56GDZEwZUQMGt YWxX+reJNxq5qzzkpsD+rWnJY9BsDoslX4rUsCbYEg2rbdmh2T35igEaEYnux24gTI9f tpvLARkgxDxazNH5JnQe2/ADMW4GH6FNGmmhIlF2iZSVlErt4tcyPByhE2m9dVhZKYzz U59A== X-Gm-Message-State: AOAM533M0Qtz3qaRWGFm4rsNXSDWluZNxrBxn2vMIsTT+y5ODxeoJFIC qIqf+4mmJGOKahJUq5TtvNObt/Rfmjw= X-Google-Smtp-Source: ABdhPJwX9D4en+hACeaFijzp6xuG9EX9sqMK2d3+XO8tWnEdN3dt94ChhAurZV3Fxr5jxFzwH9+/rqq5AG1s X-Received: by 2002:a05:6214:11a8:: with SMTP id u8mr15191510qvv.88.1596835781537; Fri, 07 Aug 2020 14:29:41 -0700 (PDT) Date: Fri, 7 Aug 2020 14:29:13 -0700 In-Reply-To: <20200807212916.2883031-1-jwadams@google.com> Message-Id: <20200807212916.2883031-5-jwadams@google.com> Mime-Version: 1.0 References: <20200807212916.2883031-1-jwadams@google.com> X-Mailer: git-send-email 2.28.0.236.gb10cc79966-goog Subject: [RFC PATCH 4/7] core/metricfs: expose softirq information through metricfs From: Jonathan Adams To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: netdev@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Greg KH , Jim Mattson , David Rientjes , Jonathan Adams Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add metricfs support for displaying percpu softirq counters. The top directory is /sys/kernel/debug/metricfs/softirq. Then there is a subdirectory for each softirq type. For example: cat /sys/kernel/debug/metricfs/softirq/NET_RX/values Signed-off-by: Jonathan Adams --- jwadams@google.com: rebased to 5.8-pre6 This is work originally done by another engineer at google, who would rather not have their name associated with this patchset. They're okay with me sending it under my name. --- kernel/softirq.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/kernel/softirq.c b/kernel/softirq.c index c4201b7f42b1..1ae3a540b789 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -25,6 +25,8 @@ #include #include #include +#include +#include #define CREATE_TRACE_POINTS #include @@ -738,3 +740,46 @@ unsigned int __weak arch_dynirq_lower_bound(unsigned int from) { return from; } + +#ifdef CONFIG_METRICFS + +#define METRICFS_ITEM(name) \ +static void \ +metricfs_##name(struct metric_emitter *e, int cpu) \ +{ \ + int64_t v = kstat_softirqs_cpu(name##_SOFTIRQ, cpu); \ + METRIC_EMIT_PERCPU_INT(e, cpu, v); \ +} \ +METRIC_EXPORT_PERCPU_COUNTER(name, #name " softirq", metricfs_##name) + +METRICFS_ITEM(HI); +METRICFS_ITEM(TIMER); +METRICFS_ITEM(NET_TX); +METRICFS_ITEM(NET_RX); +METRICFS_ITEM(BLOCK); +METRICFS_ITEM(IRQ_POLL); +METRICFS_ITEM(TASKLET); +METRICFS_ITEM(SCHED); +METRICFS_ITEM(HRTIMER); +METRICFS_ITEM(RCU); + +static int __init init_softirq_metricfs(void) +{ + struct metricfs_subsys *subsys; + + subsys = metricfs_create_subsys("softirq", NULL); + metric_init_HI(subsys); + metric_init_TIMER(subsys); + metric_init_NET_TX(subsys); + metric_init_NET_RX(subsys); + metric_init_BLOCK(subsys); + metric_init_IRQ_POLL(subsys); + metric_init_TASKLET(subsys); + metric_init_SCHED(subsys); + metric_init_RCU(subsys); + + return 0; +} +module_init(init_softirq_metricfs); + +#endif From patchwork Fri Aug 7 21:29:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Adams X-Patchwork-Id: 262640 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.6 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97B5AC433DF for ; Fri, 7 Aug 2020 21:30:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C72F20866 for ; Fri, 7 Aug 2020 21:30:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ieZHYo04" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727813AbgHGVa2 (ORCPT ); Fri, 7 Aug 2020 17:30:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727807AbgHGV3s (ORCPT ); Fri, 7 Aug 2020 17:29:48 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E64EBC061D7C for ; Fri, 7 Aug 2020 14:29:45 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id e30so2796700pfj.0 for ; Fri, 07 Aug 2020 14:29:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=1TZQegrBgPNN+SZZCsJyOP7eFX7H3/9/2zydOordKRU=; b=ieZHYo04XCdbBogCHiXjbZeFR1bV303kMQGvEtUCo3MyCZv4PiQWUZOtUp5UOdsJms OBTC87r+1D0ws9FbjFeU+TDngcJIVQp/zIa7cOyLrT+Kfwkksir2bzMZMZM71wB478Ek KoeuqdI4fr1vD4oXtX+BT3zcMk4kACaAfsSllWPfE1XV9+yGzGNku5ufgoJtirVr8ZaS pKG4jrzoRlV6GzhJsAabu9U1sMqMRLkG8XujccP5ELANmH4hQZEIMJBKDEnHRH+KpJy8 hSiEyisrJRZ8PcdacCUH3tpzWxbzlRwGl7dvUMXeAkWcUTPXKhKYFM0reJax5bPpfGFV iRkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=1TZQegrBgPNN+SZZCsJyOP7eFX7H3/9/2zydOordKRU=; b=KqIWekkOsO/Fw9H44f3Zn2tox6WGpN25vnSAXRAzYaLYXDW4eB9ro72BgY5DX1JCVR VyhStQcTyz7BcZL2MSv8DQ0/yaiUwXfvGi54QWQzfeRnR4xc6YYTYHn+juJPkp/fhb69 1r48nCunxQLMnlZpRzdR6jIBjs7lOSX7zBKFCVR6nfvaUORLiObU8PRiCmf5psdbWnSt hLWAw2yiqtKlMUOXRlhJOvuM5Vp0+KFFNTnhN2gnbfJnlJzlURKiODOHJmetOXbmUmqt liE1Nb5rph87Cvo86FlRNFeQMZArIj1dhnHk2jRbTw/yCl9esaKfZWlUTpKxPpyXD4zX zrVw== X-Gm-Message-State: AOAM532P1JMzTFNZzHitL1Aq4U7zoN0+SmVjwHKKMvrWq9i2dr72oew2 9tL+N8fEijoO0tRTsHyq9ZEYIRsFz3E= X-Google-Smtp-Source: ABdhPJz+2C/PgoKritkxy8QCsvDLemjrofpIEPFwkhIfTB+Eqa7wlcLaV2OWCQg+F4nEAB7Hu1Xv9XgPH0tp X-Received: by 2002:a17:90b:1254:: with SMTP id gx20mr16269548pjb.117.1596835785342; Fri, 07 Aug 2020 14:29:45 -0700 (PDT) Date: Fri, 7 Aug 2020 14:29:15 -0700 In-Reply-To: <20200807212916.2883031-1-jwadams@google.com> Message-Id: <20200807212916.2883031-7-jwadams@google.com> Mime-Version: 1.0 References: <20200807212916.2883031-1-jwadams@google.com> X-Mailer: git-send-email 2.28.0.236.gb10cc79966-goog Subject: [RFC PATCH 6/7] core/metricfs: expose x86-specific irq information through metricfs From: Jonathan Adams To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: netdev@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Greg KH , Jim Mattson , David Rientjes , Jonathan Adams Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add metricfs support for displaying percpu irq counters for x86. The top directory is /sys/kernel/debug/metricfs/irq_x86. Then there is a subdirectory for each x86-specific irq counter. For example: cat /sys/kernel/debug/metricfs/irq_x86/TLB/values Signed-off-by: Jonathan Adams --- jwadams@google.com: rebased to 5.8-pre6 This is work originally done by another engineer at google, who would rather not have their name associated with this patchset. They're okay with me sending it under my name. --- arch/x86/kernel/irq.c | 80 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 181060247e3c..ffacbbc4066c 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -374,3 +375,82 @@ void fixup_irqs(void) } } #endif + +#ifdef CONFIG_METRICFS +#define METRICFS_ITEM(name, field, desc) \ +static void \ +metricfs_##name(struct metric_emitter *e, int cpu) \ +{ \ + int64_t v = irq_stats(cpu)->field; \ + METRIC_EMIT_PERCPU_INT(e, cpu, v); \ +} \ +METRIC_EXPORT_PERCPU_COUNTER(name, desc, metricfs_##name) + +METRICFS_ITEM(NMI, __nmi_count, "Non-maskable interrupts"); +#ifdef CONFIG_X86_LOCAL_APIC +METRICFS_ITEM(LOC, apic_timer_irqs, "Local timer interrupts"); +METRICFS_ITEM(SPU, irq_spurious_count, "Spurious interrupts"); +METRICFS_ITEM(PMI, apic_perf_irqs, "Performance monitoring interrupts"); +METRICFS_ITEM(IWI, apic_irq_work_irqs, "IRQ work interrupts"); +METRICFS_ITEM(RTR, icr_read_retry_count, "APIC ICR read retries"); +#endif +METRICFS_ITEM(PLT, x86_platform_ipis, "Platform interrupts"); +#ifdef CONFIG_SMP +METRICFS_ITEM(RES, irq_resched_count, "Rescheduling interrupts"); +METRICFS_ITEM(CAL, irq_call_count, "Function call interrupts"); +METRICFS_ITEM(TLB, irq_tlb_count, "TLB shootdowns"); +#endif +#ifdef CONFIG_X86_THERMAL_VECTOR +METRICFS_ITEM(TRM, irq_thermal_count, "Thermal event interrupts"); +#endif +#ifdef CONFIG_X86_MCE_THRESHOLD +METRICFS_ITEM(THR, irq_threshold_count, "Threshold APIC interrupts"); +#endif +#ifdef CONFIG_X86_MCE_AMD +METRICFS_ITEM(DFR, irq_deferred_error_count, "Deferred Error APIC interrupts"); +#endif +#ifdef CONFIG_HAVE_KVM +METRICFS_ITEM(PIN, kvm_posted_intr_ipis, "Posted-interrupt notification event"); +METRICFS_ITEM(PIW, kvm_posted_intr_wakeup_ipis, + "Posted-interrupt wakeup event"); +#endif + +static int __init init_irq_metricfs(void) +{ + struct metricfs_subsys *subsys; + + subsys = metricfs_create_subsys("irq_x86", NULL); + + metric_init_NMI(subsys); +#ifdef CONFIG_X86_LOCAL_APIC + metric_init_LOC(subsys); + metric_init_SPU(subsys); + metric_init_PMI(subsys); + metric_init_IWI(subsys); + metric_init_RTR(subsys); +#endif + metric_init_PLT(subsys); +#ifdef CONFIG_SMP + metric_init_RES(subsys); + metric_init_CAL(subsys); + metric_init_TLB(subsys); +#endif +#ifdef CONFIG_X86_THERMAL_VECTOR + metric_init_TRM(subsys); +#endif +#ifdef CONFIG_X86_MCE_THRESHOLD + metric_init_THR(subsys); +#endif +#ifdef CONFIG_X86_MCE_AMD + metric_init_DFR(subsys); +#endif +#ifdef CONFIG_HAVE_KVM + metric_init_PIN(subsys); + metric_init_PIW(subsys); +#endif + + return 0; +} +module_init(init_irq_metricfs); + +#endif