Message ID | fb6e25a4833e6a0e055633092b05bae3c6e1c0d3.1611934253.git.marcelo.leitner@gmail.com |
---|---|
State | New |
Headers | show |
Series | [RESEND,net-next] netlink: add tracepoint at NL_SET_ERR_MSG | expand |
On Mon, 1 Feb 2021 15:12:19 -0300 Marcelo Ricardo Leitner wrote: > Often userspace won't request the extack information, or they don't log it > because of log level or so, and even when they do, sometimes it's not > enough to know exactly what caused the error. > > Netlink extack is the standard way of reporting erros with descriptive > error messages. With a trace point on it, we then can know exactly where > the error happened, regardless of userspace app. Also, we can even see if > the err msg was overwritten. > > The wrapper do_trace_netlink_extack() is because trace points shouldn't be > called from .h files, as trace points are not that small, and the function > call to do_trace_netlink_extack() on the macros is not protected by > tracepoint_enabled() because the macros are called from modules, and this > would require exporting some trace structs. As this is error path, it's > better to export just the wrapper instead. > > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Did you measure the allyesconfig bloat from this? How valuable is it to have the tracepoint in at the time it's set? IIRC extack is passed in to the callbacks from the netlink core, it's just not reported to user space if not requested. So we could capture the message in af_netlink.c, no?
On Mon, Feb 01, 2021 at 05:34:00PM -0800, Jakub Kicinski wrote: > On Mon, 1 Feb 2021 15:12:19 -0300 Marcelo Ricardo Leitner wrote: > > Often userspace won't request the extack information, or they don't log it > > because of log level or so, and even when they do, sometimes it's not > > enough to know exactly what caused the error. > > > > Netlink extack is the standard way of reporting erros with descriptive > > error messages. With a trace point on it, we then can know exactly where > > the error happened, regardless of userspace app. Also, we can even see if > > the err msg was overwritten. > > > > The wrapper do_trace_netlink_extack() is because trace points shouldn't be > > called from .h files, as trace points are not that small, and the function > > call to do_trace_netlink_extack() on the macros is not protected by > > tracepoint_enabled() because the macros are called from modules, and this > > would require exporting some trace structs. As this is error path, it's > > better to export just the wrapper instead. > > > > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > > Did you measure the allyesconfig bloat from this? Now I did: $ ./scripts/bloat-o-meter -t out/{orig,new}/vmlinux ... Total: Before=212077464, After=212108056, chg +0.01% $ size out/{orig,new}/vmlinux text data bss dec hex filename 267409181 333328965 83018348 683756494 28c14bce out/orig/vmlinux 267413171 333337273 83010156 683760600 28c15bd8 out/new/vmlinux with the commit on top of 46eb3c108fe1744d0a6abfda69ef8c1d4f0e92d4. It's not much because it's adding just a function call to the macro, rather than the tracepoint itself. > How valuable is it to have the tracepoint in at the time it's set? To know exactly its source. It's very helpful to track down some EINVALs reported to userspace. If not, we have to rely on the errmsg and grep the code afterwards. Also, if the message is a common one, one may not be able to easily distinguish them. Ideally this shouldn't happen, but when debugging applications such as OVS, where lots of netlink requests are flying, it saves us time. I can, for example, look at a perf capture and search for cls_flower or so. Otherwise, it will all show up as "af_netlink: <err_msg>" Also, it allows tracking when a previous errmsg (which would have been a warning) is overwritten with a new one. > IIRC extack is passed in to the callbacks from the netlink core, it's > just not reported to user space if not requested. So we could capture > the message in af_netlink.c, no? If 4k is too much, yes. It looses practicality, per above, but should be doable. Marcelo
On 2/2/21 5:30 AM, Marcelo Ricardo Leitner wrote: > > Also, if the message is a common one, one may not be able to easily > distinguish them. Ideally this shouldn't happen, but when debugging > applications such as OVS, where lots of netlink requests are flying, > it saves us time. I can, for example, look at a perf capture and > search for cls_flower or so. Otherwise, it will all show up as > "af_netlink: <err_msg>" Modules should be using the NL_SET_ERR_MSG_MOD variant, so the message would be ""af_netlink: cls_flower: <err_msg>" I get the value in knowing the call site, so not arguing against that. Just hoping that your experience matches theory.
On Tue, Feb 02, 2021 at 09:16:28AM -0700, David Ahern wrote: > On 2/2/21 5:30 AM, Marcelo Ricardo Leitner wrote: > > > > Also, if the message is a common one, one may not be able to easily > > distinguish them. Ideally this shouldn't happen, but when debugging > > applications such as OVS, where lots of netlink requests are flying, > > it saves us time. I can, for example, look at a perf capture and > > search for cls_flower or so. Otherwise, it will all show up as > > "af_netlink: <err_msg>" > > Modules should be using the NL_SET_ERR_MSG_MOD variant, so the message > would be ""af_netlink: cls_flower: <err_msg>" Ah, right. They don't always do, though (and that probably should be fixed). Also, currently there is no _MOD variant for NL_SET_ERR_MSG_ATTR. For example: $ git grep NL_SET_ERR_MSG -- cls_flower.c cls_flower.c: NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action"); cls_flower.c: NL_SET_ERR_MSG_ATTR(extack, cls_flower.c: NL_SET_ERR_MSG_ATTR(extack, cls_flower.c: NL_SET_ERR_MSG(extack, "Missing MPLS option \"depth\""); ... > > I get the value in knowing the call site, so not arguing against that. > Just hoping that your experience matches theory. Okay. Thanks, Marcelo
On Mon, 1 Feb 2021 15:12:19 -0300 Marcelo Ricardo Leitner wrote: > Often userspace won't request the extack information, or they don't log it > because of log level or so, and even when they do, sometimes it's not > enough to know exactly what caused the error. > > Netlink extack is the standard way of reporting erros with descriptive > error messages. With a trace point on it, we then can know exactly where > the error happened, regardless of userspace app. Also, we can even see if > the err msg was overwritten. > > The wrapper do_trace_netlink_extack() is because trace points shouldn't be > called from .h files, as trace points are not that small, and the function > call to do_trace_netlink_extack() on the macros is not protected by > tracepoint_enabled() because the macros are called from modules, and this > would require exporting some trace structs. As this is error path, it's > better to export just the wrapper instead. > > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > --- a/include/linux/netlink.h > +++ b/include/linux/netlink.h > @@ -6,11 +6,15 @@ > #include <linux/capability.h> > #include <linux/skbuff.h> > #include <linux/export.h> > +#include <linux/tracepoint.h> Do we need the include... > #include <net/scm.h> > #include <uapi/linux/netlink.h> > > struct net; > > +DECLARE_TRACEPOINT(netlink_extack); ... and the declaration? Seems to be a leftover. > +void do_trace_netlink_extack(const char *msg); > + > static inline struct nlmsghdr *nlmsg_hdr(const struct sk_buff *skb) > { > return (struct nlmsghdr *)skb->data; > @@ -90,6 +94,8 @@ struct netlink_ext_ack { > static const char __msg[] = msg; \ > struct netlink_ext_ack *__extack = (extack); \ > \ > + do_trace_netlink_extack(__msg); \ > + \ > if (__extack) \ > __extack->_msg = __msg; \ > } while (0) > @@ -110,6 +116,8 @@ struct netlink_ext_ack { > static const char __msg[] = msg; \ > struct netlink_ext_ack *__extack = (extack); \ > \ > + do_trace_netlink_extack(__msg); \ > + \ > if (__extack) { \ > __extack->_msg = __msg; \ > __extack->bad_attr = (attr); \
diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 9f118771e24808623287d46157046749ec96a2b5..fe77f30aabfdde8882f6de99ecf633b79e903b77 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -6,11 +6,15 @@ #include <linux/capability.h> #include <linux/skbuff.h> #include <linux/export.h> +#include <linux/tracepoint.h> #include <net/scm.h> #include <uapi/linux/netlink.h> struct net; +DECLARE_TRACEPOINT(netlink_extack); +void do_trace_netlink_extack(const char *msg); + static inline struct nlmsghdr *nlmsg_hdr(const struct sk_buff *skb) { return (struct nlmsghdr *)skb->data; @@ -90,6 +94,8 @@ struct netlink_ext_ack { static const char __msg[] = msg; \ struct netlink_ext_ack *__extack = (extack); \ \ + do_trace_netlink_extack(__msg); \ + \ if (__extack) \ __extack->_msg = __msg; \ } while (0) @@ -110,6 +116,8 @@ struct netlink_ext_ack { static const char __msg[] = msg; \ struct netlink_ext_ack *__extack = (extack); \ \ + do_trace_netlink_extack(__msg); \ + \ if (__extack) { \ __extack->_msg = __msg; \ __extack->bad_attr = (attr); \ diff --git a/include/trace/events/netlink.h b/include/trace/events/netlink.h new file mode 100644 index 0000000000000000000000000000000000000000..3b7be3b386a4f3976738a107fe4b7e0915ae58bb --- /dev/null +++ b/include/trace/events/netlink.h @@ -0,0 +1,29 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM netlink + +#if !defined(_TRACE_NETLINK_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_NETLINK_H + +#include <linux/tracepoint.h> + +TRACE_EVENT(netlink_extack, + + TP_PROTO(const char *msg), + + TP_ARGS(msg), + + TP_STRUCT__entry( + __string( msg, msg ) + ), + + TP_fast_assign( + __assign_str(msg, msg); + ), + + TP_printk("msg=%s", __get_str(msg)) +); + +#endif /* _TRACE_NETLINK_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index daca50d6bb1283f3b04b585217f2aea6ba279b8b..dd488938447f9735daf1fb727c339a9874bab38b 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -67,6 +67,8 @@ #include <net/sock.h> #include <net/scm.h> #include <net/netlink.h> +#define CREATE_TRACE_POINTS +#include <trace/events/netlink.h> #include "af_netlink.h" @@ -147,6 +149,12 @@ static BLOCKING_NOTIFIER_HEAD(netlink_chain); static const struct rhashtable_params netlink_rhashtable_params; +void do_trace_netlink_extack(const char *msg) +{ + trace_netlink_extack(msg); +} +EXPORT_SYMBOL(do_trace_netlink_extack); + static inline u32 netlink_group_mask(u32 group) { return group ? 1 << (group - 1) : 0;
Often userspace won't request the extack information, or they don't log it because of log level or so, and even when they do, sometimes it's not enough to know exactly what caused the error. Netlink extack is the standard way of reporting erros with descriptive error messages. With a trace point on it, we then can know exactly where the error happened, regardless of userspace app. Also, we can even see if the err msg was overwritten. The wrapper do_trace_netlink_extack() is because trace points shouldn't be called from .h files, as trace points are not that small, and the function call to do_trace_netlink_extack() on the macros is not protected by tracepoint_enabled() because the macros are called from modules, and this would require exporting some trace structs. As this is error path, it's better to export just the wrapper instead. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> --- include/linux/netlink.h | 8 ++++++++ include/trace/events/netlink.h | 29 +++++++++++++++++++++++++++++ net/netlink/af_netlink.c | 8 ++++++++ 3 files changed, 45 insertions(+) create mode 100644 include/trace/events/netlink.h