[RFCv3,00/19] x86/ftrace/bpf: Add batch support for direct/tracing attach

Message ID	20210605111034.1810858-1-jolsa@kernel.org
Headers	show Return-Path: <netdev-owner@kernel.org> From: Jiri Olsa <jolsa@kernel.org> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andriin@fb.com>, "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@chromium.org>, Daniel Xu <dxu@dxuuu.xyz>, Viktor Malik <vmalik@redhat.com> Subject: [RFCv3 00/19] x86/ftrace/bpf: Add batch support for direct/tracing attach Date: Sat, 5 Jun 2021 13:10:15 +0200 Message-Id: <20210605111034.1810858-1-jolsa@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT Content-Type: text/plain; charset=WINDOWS-1252 Precedence: bulk
Series	x86/ftrace/bpf: Add batch support for direct/tracing attach \| expand [RFCv3,00/19] x86/ftrace/bpf: Add batch support for direct/tracing attach [02/19] x86/ftrace: Remove fault protection code in prepare_ftrace_return [03/19] x86/ftrace: Make function graph use ftrace directly [05/19] ftrace: Add ftrace_add_rec_direct function [08/19] ftrace/samples: Add multi direct interface test module [10/19] bpf: Allow to store caller's ip as argument [11/19] bpf: Add support to load multi func tracing program [14/19] libbpf: Add btf__find_by_pattern_kind function [16/19] selftests/bpf: Add fentry multi func test [19/19] selftests/bpf: Temporary fix for fentry_fexit_multi_test

Jiri Olsa June 5, 2021, 11:10 a.m. UTC

hi,
saga continues.. ;-) previous post is in here [1]

After another discussion with Steven, he mentioned that if we fix
the ftrace graph problem with direct functions, he'd be open to
add batch interface for direct ftrace functions.

He already had prove of concept fix for that, which I took and broke
up into several changes. I added the ftrace direct batch interface
and bpf new interface on top of that.

It's not so many patches after all, so I thought having them all
together will help the review, because they are all connected.
However I can break this up into separate patchsets if necessary.

This patchset contains:

  1) patches (1-4) that fix the ftrace graph tracing over the function
     with direct trampolines attached
  2) patches (5-8) that add batch interface for ftrace direct function
     register/unregister/modify
  3) patches (9-19) that add support to attach BPF program to multiple
     functions

In nutshell:

Ad 1) moves the graph tracing setup before the direct trampoline
prepares the stack, so they don't clash

Ad 2) uses ftrace_ops interface to register direct function with
all functions in ftrace_ops filter.

Ad 3) creates special program and trampoline type to allow attachment
of multiple functions to single program.

There're more detailed desriptions in related changelogs.

I have working bpftrace multi attachment code on top this. I briefly
checked retsnoop and I think it could use the new API as well.


Also available at:
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  bpf/batch

thanks,
jirka


[1] https://lore.kernel.org/bpf/20210413121516.1467989-1-jolsa@kernel.org/

---
Jiri Olsa (17):
      x86/ftrace: Remove extra orig rax move
      tracing: Add trampoline/graph selftest
      ftrace: Add ftrace_add_rec_direct function
      ftrace: Add multi direct register/unregister interface
      ftrace: Add multi direct modify interface
      ftrace/samples: Add multi direct interface test module
      bpf, x64: Allow to use caller address from stack
      bpf: Allow to store caller's ip as argument
      bpf: Add support to load multi func tracing program
      bpf: Add bpf_trampoline_alloc function
      bpf: Add support to link multi func tracing program
      libbpf: Add btf__find_by_pattern_kind function
      libbpf: Add support to link multi func tracing program
      selftests/bpf: Add fentry multi func test
      selftests/bpf: Add fexit multi func test
      selftests/bpf: Add fentry/fexit multi func test
      selftests/bpf: Temporary fix for fentry_fexit_multi_test

Steven Rostedt (VMware) (2):
      x86/ftrace: Remove fault protection code in prepare_ftrace_return
      x86/ftrace: Make function graph use ftrace directly

 arch/x86/include/asm/ftrace.h                                    |   9 ++++--
 arch/x86/kernel/ftrace.c                                         |  71 ++++++++++++++++++++++-----------------------
 arch/x86/kernel/ftrace_64.S                                      |  30 +------------------
 arch/x86/net/bpf_jit_comp.c                                      |  31 ++++++++++++++------
 include/linux/bpf.h                                              |  14 +++++++++
 include/linux/ftrace.h                                           |  22 ++++++++++++++
 include/uapi/linux/bpf.h                                         |  12 ++++++++
 kernel/bpf/btf.c                                                 |   5 ++++
 kernel/bpf/syscall.c                                             | 220 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 kernel/bpf/trampoline.c                                          |  83 ++++++++++++++++++++++++++++++++++++++---------------
 kernel/bpf/verifier.c                                            |   3 +-
 kernel/trace/fgraph.c                                            |   8 ++++--
 kernel/trace/ftrace.c                                            | 211 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------
 kernel/trace/trace_selftest.c                                    |  49 ++++++++++++++++++++++++++++++-
 samples/ftrace/Makefile                                          |   1 +
 samples/ftrace/ftrace-direct-multi.c                             |  52 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h                                   |  12 ++++++++
 tools/lib/bpf/bpf.c                                              |  11 ++++++-
 tools/lib/bpf/bpf.h                                              |   4 ++-
 tools/lib/bpf/btf.c                                              |  68 +++++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/btf.h                                              |   3 ++
 tools/lib/bpf/libbpf.c                                           |  72 ++++++++++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/multi_check.h                        |  53 ++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/fentry_fexit_multi_test.c |  52 +++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/fentry_multi_test.c       |  43 +++++++++++++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/fexit_multi_test.c        |  44 ++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/progs/fentry_fexit_multi_test.c      |  31 ++++++++++++++++++++
 tools/testing/selftests/bpf/progs/fentry_multi_test.c            |  20 +++++++++++++
 tools/testing/selftests/bpf/progs/fexit_multi_test.c             |  22 ++++++++++++++
 29 files changed, 1121 insertions(+), 135 deletions(-)
 create mode 100644 samples/ftrace/ftrace-direct-multi.c
 create mode 100644 tools/testing/selftests/bpf/multi_check.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fentry_fexit_multi_test.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fentry_multi_test.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fexit_multi_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/fentry_fexit_multi_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/fentry_multi_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/fexit_multi_test.c

Yonghong Song June 7, 2021, 3:07 a.m. UTC | #1

On 6/5/21 4:10 AM, Jiri Olsa wrote:
> Currently we call the original function by using the absolute address

> given at the JIT generation. That's not usable when having trampoline

> attached to multiple functions. In this case we need to take the

> return address from the stack.


Here, it is mentioned to take the return address from the stack.

> 

> Adding support to retrieve the original function address from the stack


Here, it is said to take original funciton address from the stack.

> by adding new BPF_TRAMP_F_ORIG_STACK flag for arch_prepare_bpf_trampoline

> function.

> 

> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> ---

>   arch/x86/net/bpf_jit_comp.c | 13 +++++++++----

>   include/linux/bpf.h         |  5 +++++

>   2 files changed, 14 insertions(+), 4 deletions(-)

> 

> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c

> index 2a2e290fa5d8..b77e6bd78354 100644

> --- a/arch/x86/net/bpf_jit_comp.c

> +++ b/arch/x86/net/bpf_jit_comp.c

> @@ -2013,10 +2013,15 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i

>   	if (flags & BPF_TRAMP_F_CALL_ORIG) {

>   		restore_regs(m, &prog, nr_args, stack_size);

>   

> -		/* call original function */

> -		if (emit_call(&prog, orig_call, prog)) {

> -			ret = -EINVAL;

> -			goto cleanup;

> +		if (flags & BPF_TRAMP_F_ORIG_STACK) {

> +			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);


This is load double from base_pointer + 8 which should be func return 
address for x86, yet we try to call it.
I guess I must have missed something
here. Could you give some explanation?

> +			EMIT2(0xff, 0xd0); /* call *rax */

> +		} else {

> +			/* call original function */

> +			if (emit_call(&prog, orig_call, prog)) {

> +				ret = -EINVAL;

> +				goto cleanup;

> +			}

>   		}

>   		/* remember return value in a stack for bpf prog to access */

>   		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h

> index 86dec5001ae2..16fc600503fb 100644

> --- a/include/linux/bpf.h

> +++ b/include/linux/bpf.h

> @@ -554,6 +554,11 @@ struct btf_func_model {

>    */

>   #define BPF_TRAMP_F_SKIP_FRAME		BIT(2)

>   

> +/* Get original function from stack instead of from provided direct address.

> + * Makes sense for fexit programs only.

> + */

> +#define BPF_TRAMP_F_ORIG_STACK		BIT(3)

> +

>   /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50

>    * bytes on x86.  Pick a number to fit into BPF_IMAGE_SIZE / 2

>    */

>

Yonghong Song June 7, 2021, 5:36 a.m. UTC | #2

On 6/5/21 4:10 AM, Jiri Olsa wrote:
> Adding support to attach multiple functions to tracing program

> by using the link_create/link_update interface.

> 

> Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> API, that define array of functions btf ids that will be attached

> to prog_fd.

> 

> The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

> 

> The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> link type, which creates separate bpf_trampoline and registers it

> as direct function for all specified btf ids.

> 

> The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> standard trampolines, so all registered functions need to be free

> of direct functions, otherwise the link fails.


I am not sure how severe such a limitation could be in practice.
It is possible in production some non-multi fentry/fexit program
may run continuously. Does kprobe program impact this as well?

> 

> The new bpf_trampoline will store and pass to bpf program the highest

> number of arguments from all given functions.

> 

> New programs (fentry or fexit) can be added to the existing trampoline

> through the link_update interface via new_prog_fd descriptor.


Looks we do not support replacing old programs. Do we support
removing old programs?

> 

> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> ---

>   include/linux/bpf.h            |   3 +

>   include/uapi/linux/bpf.h       |   5 +

>   kernel/bpf/syscall.c           | 185 ++++++++++++++++++++++++++++++++-

>   kernel/bpf/trampoline.c        |  53 +++++++---

>   tools/include/uapi/linux/bpf.h |   5 +

>   5 files changed, 237 insertions(+), 14 deletions(-)

> 

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h

> index 23221e0e8d3c..99a81c6c22e6 100644

> --- a/include/linux/bpf.h

> +++ b/include/linux/bpf.h

> @@ -661,6 +661,7 @@ struct bpf_trampoline {

>   	struct bpf_tramp_image *cur_image;

>   	u64 selector;

>   	struct module *mod;

> +	bool multi;

>   };

>   

>   struct bpf_attach_target_info {

> @@ -746,6 +747,8 @@ void bpf_ksym_add(struct bpf_ksym *ksym);

>   void bpf_ksym_del(struct bpf_ksym *ksym);

>   int bpf_jit_charge_modmem(u32 pages);

>   void bpf_jit_uncharge_modmem(u32 pages);

> +struct bpf_trampoline *bpf_trampoline_multi_alloc(void);

> +void bpf_trampoline_multi_free(struct bpf_trampoline *tr);

>   #else

>   static inline int bpf_trampoline_link_prog(struct bpf_prog *prog,

>   					   struct bpf_trampoline *tr)

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> index ad9340fb14d4..5fd6ff64e8dc 100644

> --- a/include/uapi/linux/bpf.h

> +++ b/include/uapi/linux/bpf.h

> @@ -1007,6 +1007,7 @@ enum bpf_link_type {

>   	BPF_LINK_TYPE_ITER = 4,

>   	BPF_LINK_TYPE_NETNS = 5,

>   	BPF_LINK_TYPE_XDP = 6,

> +	BPF_LINK_TYPE_TRACING_MULTI = 7,

>   

>   	MAX_BPF_LINK_TYPE,

>   };

> @@ -1454,6 +1455,10 @@ union bpf_attr {

>   				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */

>   				__u32		iter_info_len;	/* iter_info length */

>   			};

> +			struct {

> +				__aligned_u64	multi_btf_ids;		/* addresses to attach */

> +				__u32		multi_btf_ids_cnt;	/* addresses count */

> +			};

>   		};

>   	} link_create;

>   

[...]
> +static int bpf_tracing_multi_link_fill_link_info(const struct bpf_link *link,

> +						 struct bpf_link_info *info)

> +{

> +	struct bpf_tracing_multi_link *tr_link =

> +		container_of(link, struct bpf_tracing_multi_link, link);

> +

> +	info->tracing.attach_type = tr_link->attach_type;

> +	return 0;

> +}

> +

> +static int check_multi_prog_type(struct bpf_prog *prog)

> +{

> +	if (!prog->aux->multi_func &&

> +	    prog->type != BPF_PROG_TYPE_TRACING)


I think prog->type != BPF_PROG_TYPE_TRACING is not needed, it should 
have been checked during program load time?

> +		return -EINVAL;

> +	if (prog->expected_attach_type != BPF_TRACE_FENTRY &&

> +	    prog->expected_attach_type != BPF_TRACE_FEXIT)

> +		return -EINVAL;

> +	return 0;

> +}

> +

> +static int bpf_tracing_multi_link_update(struct bpf_link *link,

> +					 struct bpf_prog *new_prog,

> +					 struct bpf_prog *old_prog __maybe_unused)

> +{

> +	struct bpf_tracing_multi_link *tr_link =

> +		container_of(link, struct bpf_tracing_multi_link, link);

> +	int err;

> +

> +	if (check_multi_prog_type(new_prog))

> +		return -EINVAL;

> +

> +	err = bpf_trampoline_link_prog(new_prog, tr_link->tr);

> +	if (err)

> +		return err;

> +

> +	err = modify_ftrace_direct_multi(&tr_link->ops,

> +					 (unsigned long) tr_link->tr->cur_image->image);

> +	return WARN_ON(err);


Why WARN_ON here? Some comments will be good.

> +}

> +

> +static const struct bpf_link_ops bpf_tracing_multi_link_lops = {

> +	.release = bpf_tracing_multi_link_release,

> +	.dealloc = bpf_tracing_multi_link_dealloc,

> +	.show_fdinfo = bpf_tracing_multi_link_show_fdinfo,

> +	.fill_link_info = bpf_tracing_multi_link_fill_link_info,

> +	.update_prog = bpf_tracing_multi_link_update,

> +};

> +

[...]
> +

>   struct bpf_raw_tp_link {

>   	struct bpf_link link;

>   	struct bpf_raw_event_map *btp;

> @@ -3043,6 +3222,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)

>   	case BPF_CGROUP_SETSOCKOPT:

>   		return BPF_PROG_TYPE_CGROUP_SOCKOPT;

>   	case BPF_TRACE_ITER:

> +	case BPF_TRACE_FENTRY:

> +	case BPF_TRACE_FEXIT:

>   		return BPF_PROG_TYPE_TRACING;

>   	case BPF_SK_LOOKUP:

>   		return BPF_PROG_TYPE_SK_LOOKUP;

> @@ -4099,6 +4280,8 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,

>   

>   	if (prog->expected_attach_type == BPF_TRACE_ITER)

>   		return bpf_iter_link_attach(attr, uattr, prog);

> +	else if (prog->aux->multi_func)

> +		return bpf_tracing_multi_attach(prog, attr);

>   	else if (prog->type == BPF_PROG_TYPE_EXT)

>   		return bpf_tracing_prog_attach(prog,

>   					       attr->link_create.target_fd,

> @@ -4106,7 +4289,7 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,

>   	return -EINVAL;

>   }

>   

> -#define BPF_LINK_CREATE_LAST_FIELD link_create.iter_info_len

> +#define BPF_LINK_CREATE_LAST_FIELD link_create.multi_btf_ids_cnt


It is okay that we don't change this. link_create.iter_info_len
has the same effect since it is a union.

>   static int link_create(union bpf_attr *attr, bpfptr_t uattr)

>   {

>   	enum bpf_prog_type ptype;

> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c

> index 2755fdcf9fbf..660b8197c27f 100644

> --- a/kernel/bpf/trampoline.c

> +++ b/kernel/bpf/trampoline.c

> @@ -58,7 +58,7 @@ void bpf_image_ksym_del(struct bpf_ksym *ksym)

>   			   PAGE_SIZE, true, ksym->name);

>   }

>   

> -static struct bpf_trampoline *bpf_trampoline_alloc(void)

> +static struct bpf_trampoline *bpf_trampoline_alloc(bool multi)

>   {

>   	struct bpf_trampoline *tr;

>   	int i;

> @@ -72,6 +72,7 @@ static struct bpf_trampoline *bpf_trampoline_alloc(void)

>   	mutex_init(&tr->mutex);

>   	for (i = 0; i < BPF_TRAMP_MAX; i++)

>   		INIT_HLIST_HEAD(&tr->progs_hlist[i]);

> +	tr->multi = multi;

>   	return tr;

>   }

>   

> @@ -88,7 +89,7 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)

>   			goto out;

>   		}

>   	}

> -	tr = bpf_trampoline_alloc();

> +	tr = bpf_trampoline_alloc(false);

>   	if (tr) {

>   		tr->key = key;

>   		hlist_add_head(&tr->hlist, head);

> @@ -343,14 +344,16 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)

>   	struct bpf_tramp_image *im;

>   	struct bpf_tramp_progs *tprogs;

>   	u32 flags = BPF_TRAMP_F_RESTORE_REGS;

> -	int err, total;

> +	bool update = !tr->multi;

> +	int err = 0, total;

>   

>   	tprogs = bpf_trampoline_get_progs(tr, &total);

>   	if (IS_ERR(tprogs))

>   		return PTR_ERR(tprogs);

>   

>   	if (total == 0) {

> -		err = unregister_fentry(tr, tr->cur_image->image);

> +		if (update)

> +			err = unregister_fentry(tr, tr->cur_image->image);

>   		bpf_tramp_image_put(tr->cur_image);

>   		tr->cur_image = NULL;

>   		tr->selector = 0;

> @@ -363,9 +366,15 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)

>   		goto out;

>   	}

>   

> +	if (tr->multi)

> +		flags |= BPF_TRAMP_F_IP_ARG;

> +

>   	if (tprogs[BPF_TRAMP_FEXIT].nr_progs ||

> -	    tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)

> +	    tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs) {

>   		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;

> +		if (tr->multi)

> +			flags |= BPF_TRAMP_F_ORIG_STACK | BPF_TRAMP_F_IP_ARG;


BPF_TRAMP_F_IP_ARG is not needed. It has been added before.

> +	}

>   

>   	err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE,

>   					  &tr->func.model, flags, tprogs,

> @@ -373,16 +382,19 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)

>   	if (err < 0)

>   		goto out;

>   

> +	err = 0;

>   	WARN_ON(tr->cur_image && tr->selector == 0);

>   	WARN_ON(!tr->cur_image && tr->selector);

> -	if (tr->cur_image)

> -		/* progs already running at this address */

> -		err = modify_fentry(tr, tr->cur_image->image, im->image);

> -	else

> -		/* first time registering */

> -		err = register_fentry(tr, im->image);

> -	if (err)

> -		goto out;

> +	if (update) {

> +		if (tr->cur_image)

> +			/* progs already running at this address */

> +			err = modify_fentry(tr, tr->cur_image->image, im->image);

> +		else

> +			/* first time registering */

> +			err = register_fentry(tr, im->image);

> +		if (err)

> +			goto out;

> +	}

>   	if (tr->cur_image)

>   		bpf_tramp_image_put(tr->cur_image);

>   	tr->cur_image = im;

> @@ -436,6 +448,10 @@ int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)

>   			err = -EBUSY;

>   			goto out;

>   		}

> +		if (tr->multi) {

> +			err = -EINVAL;

> +			goto out;

> +		}

>   		tr->extension_prog = prog;

>   		err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP, NULL,

>   					 prog->bpf_func);

[...]

Yonghong Song June 7, 2021, 5:49 a.m. UTC | #3

On 6/5/21 4:10 AM, Jiri Olsa wrote:
> Adding support to link multi func tracing program

> through link_create interface.

> 

> Adding special types for multi func programs:

> 

>    fentry.multi

>    fexit.multi

> 

> so you can define multi func programs like:

> 

>    SEC("fentry.multi/bpf_fentry_test*")

>    int BPF_PROG(test1, unsigned long ip, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f)

> 

> that defines test1 to be attached to bpf_fentry_test* functions,

> and able to attach ip and 6 arguments.

> 

> If functions are not specified the program needs to be attached

> manually.

> 

> Adding new btf id related fields to bpf_link_create_opts and

> bpf_link_create to use them.

> 

> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> ---

>   tools/lib/bpf/bpf.c    | 11 ++++++-

>   tools/lib/bpf/bpf.h    |  4 ++-

>   tools/lib/bpf/libbpf.c | 72 ++++++++++++++++++++++++++++++++++++++++++

>   3 files changed, 85 insertions(+), 2 deletions(-)

> 

> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c

> index 86dcac44f32f..da892737b522 100644

> --- a/tools/lib/bpf/bpf.c

> +++ b/tools/lib/bpf/bpf.c

> @@ -674,7 +674,8 @@ int bpf_link_create(int prog_fd, int target_fd,

>   		    enum bpf_attach_type attach_type,

>   		    const struct bpf_link_create_opts *opts)

>   {

> -	__u32 target_btf_id, iter_info_len;

> +	__u32 target_btf_id, iter_info_len, multi_btf_ids_cnt;

> +	__s32 *multi_btf_ids;

>   	union bpf_attr attr;

>   	int fd;

>   

[...]
> @@ -9584,6 +9597,9 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, int *btf_obj_fd,

>   	if (!name)

>   		return -EINVAL;

>   

> +	if (prog->prog_flags & BPF_F_MULTI_FUNC)

> +		return 0;

> +

>   	for (i = 0; i < ARRAY_SIZE(section_defs); i++) {

>   		if (!section_defs[i].is_attach_btf)

>   			continue;

> @@ -10537,6 +10553,62 @@ static struct bpf_link *bpf_program__attach_btf_id(struct bpf_program *prog)

>   	return (struct bpf_link *)link;

>   }

>   

> +static struct bpf_link *bpf_program__attach_multi(struct bpf_program *prog)

> +{

> +	char *pattern = prog->sec_name + prog->sec_def->len;

> +	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);

> +	enum bpf_attach_type attach_type;

> +	int prog_fd, link_fd, cnt, err;

> +	struct bpf_link *link = NULL;

> +	__s32 *ids = NULL;

> +

> +	prog_fd = bpf_program__fd(prog);

> +	if (prog_fd < 0) {

> +		pr_warn("prog '%s': can't attach before loaded\n", prog->name);

> +		return ERR_PTR(-EINVAL);

> +	}

> +

> +	err = bpf_object__load_vmlinux_btf(prog->obj, true);

> +	if (err)

> +		return ERR_PTR(err);

> +

> +	cnt = btf__find_by_pattern_kind(prog->obj->btf_vmlinux, pattern,

> +					BTF_KIND_FUNC, &ids);

> +	if (cnt <= 0)

> +		return ERR_PTR(-EINVAL);


In kernel, looks like we support cnt = 0, here we error out.
Should we also error out in the kernel if cnt == 0?

> +

> +	link = calloc(1, sizeof(*link));

> +	if (!link) {

> +		err = -ENOMEM;

> +		goto out_err;

> +	}

> +	link->detach = &bpf_link__detach_fd;

> +

> +	opts.multi_btf_ids = ids;

> +	opts.multi_btf_ids_cnt = cnt;

> +

> +	attach_type = bpf_program__get_expected_attach_type(prog);

> +	link_fd = bpf_link_create(prog_fd, 0, attach_type, &opts);

> +	if (link_fd < 0) {

> +		err = -errno;

> +		goto out_err;

> +	}

> +	link->fd = link_fd;

> +	free(ids);

> +	return link;

> +

> +out_err:

> +	free(link);

> +	free(ids);

> +	return ERR_PTR(err);

> +}

> +

[...]

Jiri Olsa June 7, 2021, 6:13 p.m. UTC | #4

On Sun, Jun 06, 2021 at 08:07:44PM -0700, Yonghong Song wrote:
> 

> 

> On 6/5/21 4:10 AM, Jiri Olsa wrote:

> > Currently we call the original function by using the absolute address

> > given at the JIT generation. That's not usable when having trampoline

> > attached to multiple functions. In this case we need to take the

> > return address from the stack.

> 

> Here, it is mentioned to take the return address from the stack.

> 

> > 

> > Adding support to retrieve the original function address from the stack

> 

> Here, it is said to take original funciton address from the stack.


sorry if the description is confusing as always, the idea
is to take the function's return address from fentry call:

   function
     call fentry
     xxxx             <---- this address 

and use it to call the original function body before fexit handler

jirka

> 

> > by adding new BPF_TRAMP_F_ORIG_STACK flag for arch_prepare_bpf_trampoline

> > function.

> > 

> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> > ---

> >   arch/x86/net/bpf_jit_comp.c | 13 +++++++++----

> >   include/linux/bpf.h         |  5 +++++

> >   2 files changed, 14 insertions(+), 4 deletions(-)

> > 

> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c

> > index 2a2e290fa5d8..b77e6bd78354 100644

> > --- a/arch/x86/net/bpf_jit_comp.c

> > +++ b/arch/x86/net/bpf_jit_comp.c

> > @@ -2013,10 +2013,15 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i

> >   	if (flags & BPF_TRAMP_F_CALL_ORIG) {

> >   		restore_regs(m, &prog, nr_args, stack_size);

> > -		/* call original function */

> > -		if (emit_call(&prog, orig_call, prog)) {

> > -			ret = -EINVAL;

> > -			goto cleanup;

> > +		if (flags & BPF_TRAMP_F_ORIG_STACK) {

> > +			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);

> 

> This is load double from base_pointer + 8 which should be func return

> address for x86, yet we try to call it.

> I guess I must have missed something

> here. Could you give some explanation?

> 

> > +			EMIT2(0xff, 0xd0); /* call *rax */

> > +		} else {

> > +			/* call original function */

> > +			if (emit_call(&prog, orig_call, prog)) {

> > +				ret = -EINVAL;

> > +				goto cleanup;

> > +			}

> >   		}

> >   		/* remember return value in a stack for bpf prog to access */

> >   		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);

> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h

> > index 86dec5001ae2..16fc600503fb 100644

> > --- a/include/linux/bpf.h

> > +++ b/include/linux/bpf.h

> > @@ -554,6 +554,11 @@ struct btf_func_model {

> >    */

> >   #define BPF_TRAMP_F_SKIP_FRAME		BIT(2)

> > +/* Get original function from stack instead of from provided direct address.

> > + * Makes sense for fexit programs only.

> > + */

> > +#define BPF_TRAMP_F_ORIG_STACK		BIT(3)

> > +

> >   /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50

> >    * bytes on x86.  Pick a number to fit into BPF_IMAGE_SIZE / 2

> >    */

> > 

>

Jiri Olsa June 7, 2021, 6:25 p.m. UTC | #5

On Sun, Jun 06, 2021 at 10:36:57PM -0700, Yonghong Song wrote:
> 

> 

> On 6/5/21 4:10 AM, Jiri Olsa wrote:

> > Adding support to attach multiple functions to tracing program

> > by using the link_create/link_update interface.

> > 

> > Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> > API, that define array of functions btf ids that will be attached

> > to prog_fd.

> > 

> > The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

> > 

> > The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> > link type, which creates separate bpf_trampoline and registers it

> > as direct function for all specified btf ids.

> > 

> > The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> > standard trampolines, so all registered functions need to be free

> > of direct functions, otherwise the link fails.

> 

> I am not sure how severe such a limitation could be in practice.

> It is possible in production some non-multi fentry/fexit program

> may run continuously. Does kprobe program impact this as well?


I did not find a way how to combine current trampolines with the
new ones for multiple programs.. what you described is a limitation
of the current approach

I'm not sure about kprobes and trampolines, but the limitation
should be same as we do have for current trampolines.. I'll check

> 

> > 

> > The new bpf_trampoline will store and pass to bpf program the highest

> > number of arguments from all given functions.

> > 

> > New programs (fentry or fexit) can be added to the existing trampoline

> > through the link_update interface via new_prog_fd descriptor.

> 

> Looks we do not support replacing old programs. Do we support

> removing old programs?


we don't.. it's not what bpftrace would do, it just adds programs
to trace and close all when it's done.. I think interface for removal
could be added if you think it's needed

> 

> > 

> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> > ---

> >   include/linux/bpf.h            |   3 +

> >   include/uapi/linux/bpf.h       |   5 +

> >   kernel/bpf/syscall.c           | 185 ++++++++++++++++++++++++++++++++-

> >   kernel/bpf/trampoline.c        |  53 +++++++---

> >   tools/include/uapi/linux/bpf.h |   5 +

> >   5 files changed, 237 insertions(+), 14 deletions(-)

> > 

> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h

> > index 23221e0e8d3c..99a81c6c22e6 100644

> > --- a/include/linux/bpf.h

> > +++ b/include/linux/bpf.h

> > @@ -661,6 +661,7 @@ struct bpf_trampoline {

> >   	struct bpf_tramp_image *cur_image;

> >   	u64 selector;

> >   	struct module *mod;

> > +	bool multi;

> >   };

> >   struct bpf_attach_target_info {

> > @@ -746,6 +747,8 @@ void bpf_ksym_add(struct bpf_ksym *ksym);

> >   void bpf_ksym_del(struct bpf_ksym *ksym);

> >   int bpf_jit_charge_modmem(u32 pages);

> >   void bpf_jit_uncharge_modmem(u32 pages);

> > +struct bpf_trampoline *bpf_trampoline_multi_alloc(void);

> > +void bpf_trampoline_multi_free(struct bpf_trampoline *tr);

> >   #else

> >   static inline int bpf_trampoline_link_prog(struct bpf_prog *prog,

> >   					   struct bpf_trampoline *tr)

> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> > index ad9340fb14d4..5fd6ff64e8dc 100644

> > --- a/include/uapi/linux/bpf.h

> > +++ b/include/uapi/linux/bpf.h

> > @@ -1007,6 +1007,7 @@ enum bpf_link_type {

> >   	BPF_LINK_TYPE_ITER = 4,

> >   	BPF_LINK_TYPE_NETNS = 5,

> >   	BPF_LINK_TYPE_XDP = 6,

> > +	BPF_LINK_TYPE_TRACING_MULTI = 7,

> >   	MAX_BPF_LINK_TYPE,

> >   };

> > @@ -1454,6 +1455,10 @@ union bpf_attr {

> >   				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */

> >   				__u32		iter_info_len;	/* iter_info length */

> >   			};

> > +			struct {

> > +				__aligned_u64	multi_btf_ids;		/* addresses to attach */

> > +				__u32		multi_btf_ids_cnt;	/* addresses count */

> > +			};

> >   		};

> >   	} link_create;

> [...]

> > +static int bpf_tracing_multi_link_fill_link_info(const struct bpf_link *link,

> > +						 struct bpf_link_info *info)

> > +{

> > +	struct bpf_tracing_multi_link *tr_link =

> > +		container_of(link, struct bpf_tracing_multi_link, link);

> > +

> > +	info->tracing.attach_type = tr_link->attach_type;

> > +	return 0;

> > +}

> > +

> > +static int check_multi_prog_type(struct bpf_prog *prog)

> > +{

> > +	if (!prog->aux->multi_func &&

> > +	    prog->type != BPF_PROG_TYPE_TRACING)

> 

> I think prog->type != BPF_PROG_TYPE_TRACING is not needed, it should have

> been checked during program load time?

> 

> > +		return -EINVAL;

> > +	if (prog->expected_attach_type != BPF_TRACE_FENTRY &&

> > +	    prog->expected_attach_type != BPF_TRACE_FEXIT)

> > +		return -EINVAL;

> > +	return 0;

> > +}

> > +

> > +static int bpf_tracing_multi_link_update(struct bpf_link *link,

> > +					 struct bpf_prog *new_prog,

> > +					 struct bpf_prog *old_prog __maybe_unused)

> > +{

> > +	struct bpf_tracing_multi_link *tr_link =

> > +		container_of(link, struct bpf_tracing_multi_link, link);

> > +	int err;

> > +

> > +	if (check_multi_prog_type(new_prog))

> > +		return -EINVAL;

> > +

> > +	err = bpf_trampoline_link_prog(new_prog, tr_link->tr);

> > +	if (err)

> > +		return err;

> > +

> > +	err = modify_ftrace_direct_multi(&tr_link->ops,

> > +					 (unsigned long) tr_link->tr->cur_image->image);

> > +	return WARN_ON(err);

> 

> Why WARN_ON here? Some comments will be good.

> 

> > +}

> > +

> > +static const struct bpf_link_ops bpf_tracing_multi_link_lops = {

> > +	.release = bpf_tracing_multi_link_release,

> > +	.dealloc = bpf_tracing_multi_link_dealloc,

> > +	.show_fdinfo = bpf_tracing_multi_link_show_fdinfo,

> > +	.fill_link_info = bpf_tracing_multi_link_fill_link_info,

> > +	.update_prog = bpf_tracing_multi_link_update,

> > +};

> > +

> [...]

> > +

> >   struct bpf_raw_tp_link {

> >   	struct bpf_link link;

> >   	struct bpf_raw_event_map *btp;

> > @@ -3043,6 +3222,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)

> >   	case BPF_CGROUP_SETSOCKOPT:

> >   		return BPF_PROG_TYPE_CGROUP_SOCKOPT;

> >   	case BPF_TRACE_ITER:

> > +	case BPF_TRACE_FENTRY:

> > +	case BPF_TRACE_FEXIT:

> >   		return BPF_PROG_TYPE_TRACING;

> >   	case BPF_SK_LOOKUP:

> >   		return BPF_PROG_TYPE_SK_LOOKUP;

> > @@ -4099,6 +4280,8 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,

> >   	if (prog->expected_attach_type == BPF_TRACE_ITER)

> >   		return bpf_iter_link_attach(attr, uattr, prog);

> > +	else if (prog->aux->multi_func)

> > +		return bpf_tracing_multi_attach(prog, attr);

> >   	else if (prog->type == BPF_PROG_TYPE_EXT)

> >   		return bpf_tracing_prog_attach(prog,

> >   					       attr->link_create.target_fd,

> > @@ -4106,7 +4289,7 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,

> >   	return -EINVAL;

> >   }

> > -#define BPF_LINK_CREATE_LAST_FIELD link_create.iter_info_len

> > +#define BPF_LINK_CREATE_LAST_FIELD link_create.multi_btf_ids_cnt

> 

> It is okay that we don't change this. link_create.iter_info_len

> has the same effect since it is a union.

> 

> >   static int link_create(union bpf_attr *attr, bpfptr_t uattr)

> >   {

> >   	enum bpf_prog_type ptype;

> > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c

> > index 2755fdcf9fbf..660b8197c27f 100644

> > --- a/kernel/bpf/trampoline.c

> > +++ b/kernel/bpf/trampoline.c

> > @@ -58,7 +58,7 @@ void bpf_image_ksym_del(struct bpf_ksym *ksym)

> >   			   PAGE_SIZE, true, ksym->name);

> >   }

> > -static struct bpf_trampoline *bpf_trampoline_alloc(void)

> > +static struct bpf_trampoline *bpf_trampoline_alloc(bool multi)

> >   {

> >   	struct bpf_trampoline *tr;

> >   	int i;

> > @@ -72,6 +72,7 @@ static struct bpf_trampoline *bpf_trampoline_alloc(void)

> >   	mutex_init(&tr->mutex);

> >   	for (i = 0; i < BPF_TRAMP_MAX; i++)

> >   		INIT_HLIST_HEAD(&tr->progs_hlist[i]);

> > +	tr->multi = multi;

> >   	return tr;

> >   }

> > @@ -88,7 +89,7 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)

> >   			goto out;

> >   		}

> >   	}

> > -	tr = bpf_trampoline_alloc();

> > +	tr = bpf_trampoline_alloc(false);

> >   	if (tr) {

> >   		tr->key = key;

> >   		hlist_add_head(&tr->hlist, head);

> > @@ -343,14 +344,16 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)

> >   	struct bpf_tramp_image *im;

> >   	struct bpf_tramp_progs *tprogs;

> >   	u32 flags = BPF_TRAMP_F_RESTORE_REGS;

> > -	int err, total;

> > +	bool update = !tr->multi;

> > +	int err = 0, total;

> >   	tprogs = bpf_trampoline_get_progs(tr, &total);

> >   	if (IS_ERR(tprogs))

> >   		return PTR_ERR(tprogs);

> >   	if (total == 0) {

> > -		err = unregister_fentry(tr, tr->cur_image->image);

> > +		if (update)

> > +			err = unregister_fentry(tr, tr->cur_image->image);

> >   		bpf_tramp_image_put(tr->cur_image);

> >   		tr->cur_image = NULL;

> >   		tr->selector = 0;

> > @@ -363,9 +366,15 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)

> >   		goto out;

> >   	}

> > +	if (tr->multi)

> > +		flags |= BPF_TRAMP_F_IP_ARG;

> > +

> >   	if (tprogs[BPF_TRAMP_FEXIT].nr_progs ||

> > -	    tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)

> > +	    tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs) {

> >   		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;

> > +		if (tr->multi)

> > +			flags |= BPF_TRAMP_F_ORIG_STACK | BPF_TRAMP_F_IP_ARG;

> 

> BPF_TRAMP_F_IP_ARG is not needed. It has been added before.


it's erased in 2 lines above.. which reminds me that I forgot to check
if that's a bug or intended ;-)

jirka

Jiri Olsa June 7, 2021, 6:28 p.m. UTC | #6

On Sun, Jun 06, 2021 at 10:49:16PM -0700, Yonghong Song wrote:
> 

> 

> On 6/5/21 4:10 AM, Jiri Olsa wrote:

> > Adding support to link multi func tracing program

> > through link_create interface.

> > 

> > Adding special types for multi func programs:

> > 

> >    fentry.multi

> >    fexit.multi

> > 

> > so you can define multi func programs like:

> > 

> >    SEC("fentry.multi/bpf_fentry_test*")

> >    int BPF_PROG(test1, unsigned long ip, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f)

> > 

> > that defines test1 to be attached to bpf_fentry_test* functions,

> > and able to attach ip and 6 arguments.

> > 

> > If functions are not specified the program needs to be attached

> > manually.

> > 

> > Adding new btf id related fields to bpf_link_create_opts and

> > bpf_link_create to use them.

> > 

> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> > ---

> >   tools/lib/bpf/bpf.c    | 11 ++++++-

> >   tools/lib/bpf/bpf.h    |  4 ++-

> >   tools/lib/bpf/libbpf.c | 72 ++++++++++++++++++++++++++++++++++++++++++

> >   3 files changed, 85 insertions(+), 2 deletions(-)

> > 

> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c

> > index 86dcac44f32f..da892737b522 100644

> > --- a/tools/lib/bpf/bpf.c

> > +++ b/tools/lib/bpf/bpf.c

> > @@ -674,7 +674,8 @@ int bpf_link_create(int prog_fd, int target_fd,

> >   		    enum bpf_attach_type attach_type,

> >   		    const struct bpf_link_create_opts *opts)

> >   {

> > -	__u32 target_btf_id, iter_info_len;

> > +	__u32 target_btf_id, iter_info_len, multi_btf_ids_cnt;

> > +	__s32 *multi_btf_ids;

> >   	union bpf_attr attr;

> >   	int fd;

> [...]

> > @@ -9584,6 +9597,9 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, int *btf_obj_fd,

> >   	if (!name)

> >   		return -EINVAL;

> > +	if (prog->prog_flags & BPF_F_MULTI_FUNC)

> > +		return 0;

> > +

> >   	for (i = 0; i < ARRAY_SIZE(section_defs); i++) {

> >   		if (!section_defs[i].is_attach_btf)

> >   			continue;

> > @@ -10537,6 +10553,62 @@ static struct bpf_link *bpf_program__attach_btf_id(struct bpf_program *prog)

> >   	return (struct bpf_link *)link;

> >   }

> > +static struct bpf_link *bpf_program__attach_multi(struct bpf_program *prog)

> > +{

> > +	char *pattern = prog->sec_name + prog->sec_def->len;

> > +	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);

> > +	enum bpf_attach_type attach_type;

> > +	int prog_fd, link_fd, cnt, err;

> > +	struct bpf_link *link = NULL;

> > +	__s32 *ids = NULL;

> > +

> > +	prog_fd = bpf_program__fd(prog);

> > +	if (prog_fd < 0) {

> > +		pr_warn("prog '%s': can't attach before loaded\n", prog->name);

> > +		return ERR_PTR(-EINVAL);

> > +	}

> > +

> > +	err = bpf_object__load_vmlinux_btf(prog->obj, true);

> > +	if (err)

> > +		return ERR_PTR(err);

> > +

> > +	cnt = btf__find_by_pattern_kind(prog->obj->btf_vmlinux, pattern,

> > +					BTF_KIND_FUNC, &ids);

> > +	if (cnt <= 0)

> > +		return ERR_PTR(-EINVAL);

> 

> In kernel, looks like we support cnt = 0, here we error out.

> Should we also error out in the kernel if cnt == 0?


hum, I'm not what you mean.. what kernel code are you referring to?

thanks,
jirka

Yonghong Song June 7, 2021, 7:39 p.m. UTC | #7

On 6/7/21 11:25 AM, Jiri Olsa wrote:
> On Sun, Jun 06, 2021 at 10:36:57PM -0700, Yonghong Song wrote:

>>

>>

>> On 6/5/21 4:10 AM, Jiri Olsa wrote:

>>> Adding support to attach multiple functions to tracing program

>>> by using the link_create/link_update interface.

>>>

>>> Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

>>> API, that define array of functions btf ids that will be attached

>>> to prog_fd.

>>>

>>> The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

>>>

>>> The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

>>> link type, which creates separate bpf_trampoline and registers it

>>> as direct function for all specified btf ids.

>>>

>>> The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

>>> standard trampolines, so all registered functions need to be free

>>> of direct functions, otherwise the link fails.

>>

>> I am not sure how severe such a limitation could be in practice.

>> It is possible in production some non-multi fentry/fexit program

>> may run continuously. Does kprobe program impact this as well?

> 

> I did not find a way how to combine current trampolines with the

> new ones for multiple programs.. what you described is a limitation

> of the current approach

> 

> I'm not sure about kprobes and trampolines, but the limitation

> should be same as we do have for current trampolines.. I'll check

> 

>>

>>>

>>> The new bpf_trampoline will store and pass to bpf program the highest

>>> number of arguments from all given functions.

>>>

>>> New programs (fentry or fexit) can be added to the existing trampoline

>>> through the link_update interface via new_prog_fd descriptor.

>>

>> Looks we do not support replacing old programs. Do we support

>> removing old programs?

> 

> we don't.. it's not what bpftrace would do, it just adds programs

> to trace and close all when it's done.. I think interface for removal

> could be added if you think it's needed


This can be a followup patch. Indeed removing selective old programs 
probably not a common use case.

> 

>>

>>>

>>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

>>> ---

>>>    include/linux/bpf.h            |   3 +

>>>    include/uapi/linux/bpf.h       |   5 +

>>>    kernel/bpf/syscall.c           | 185 ++++++++++++++++++++++++++++++++-

>>>    kernel/bpf/trampoline.c        |  53 +++++++---

>>>    tools/include/uapi/linux/bpf.h |   5 +

>>>    5 files changed, 237 insertions(+), 14 deletions(-)

>>>

>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h

>>> index 23221e0e8d3c..99a81c6c22e6 100644

>>> --- a/include/linux/bpf.h

>>> +++ b/include/linux/bpf.h

>>> @@ -661,6 +661,7 @@ struct bpf_trampoline {

>>>    	struct bpf_tramp_image *cur_image;

>>>    	u64 selector;

>>>    	struct module *mod;

>>> +	bool multi;

>>>    };

>>>    struct bpf_attach_target_info {

>>> @@ -746,6 +747,8 @@ void bpf_ksym_add(struct bpf_ksym *ksym);

>>>    void bpf_ksym_del(struct bpf_ksym *ksym);

>>>    int bpf_jit_charge_modmem(u32 pages);

>>>    void bpf_jit_uncharge_modmem(u32 pages);

>>> +struct bpf_trampoline *bpf_trampoline_multi_alloc(void);

>>> +void bpf_trampoline_multi_free(struct bpf_trampoline *tr);

>>>    #else

>>>    static inline int bpf_trampoline_link_prog(struct bpf_prog *prog,

>>>    					   struct bpf_trampoline *tr)

[...]
>>> @@ -363,9 +366,15 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)

>>>    		goto out;

>>>    	}

>>> +	if (tr->multi)

>>> +		flags |= BPF_TRAMP_F_IP_ARG;

>>> +

>>>    	if (tprogs[BPF_TRAMP_FEXIT].nr_progs ||

>>> -	    tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)

>>> +	    tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs) {

>>>    		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;

>>> +		if (tr->multi)

>>> +			flags |= BPF_TRAMP_F_ORIG_STACK | BPF_TRAMP_F_IP_ARG;

>>

>> BPF_TRAMP_F_IP_ARG is not needed. It has been added before.

> 

> it's erased in 2 lines above.. which reminds me that I forgot to check

> if that's a bug or intended ;-)


Oh, yes, I miss that too :-) I guess it would be good if you can
re-organize the code to avoid resetting of flags.

> 

> jirka

>

Yonghong Song June 7, 2021, 7:42 p.m. UTC | #8

On 6/7/21 11:28 AM, Jiri Olsa wrote:
> On Sun, Jun 06, 2021 at 10:49:16PM -0700, Yonghong Song wrote:

>>

>>

>> On 6/5/21 4:10 AM, Jiri Olsa wrote:

>>> Adding support to link multi func tracing program

>>> through link_create interface.

>>>

>>> Adding special types for multi func programs:

>>>

>>>     fentry.multi

>>>     fexit.multi

>>>

>>> so you can define multi func programs like:

>>>

>>>     SEC("fentry.multi/bpf_fentry_test*")

>>>     int BPF_PROG(test1, unsigned long ip, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f)

>>>

>>> that defines test1 to be attached to bpf_fentry_test* functions,

>>> and able to attach ip and 6 arguments.

>>>

>>> If functions are not specified the program needs to be attached

>>> manually.

>>>

>>> Adding new btf id related fields to bpf_link_create_opts and

>>> bpf_link_create to use them.

>>>

>>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

>>> ---

>>>    tools/lib/bpf/bpf.c    | 11 ++++++-

>>>    tools/lib/bpf/bpf.h    |  4 ++-

>>>    tools/lib/bpf/libbpf.c | 72 ++++++++++++++++++++++++++++++++++++++++++

>>>    3 files changed, 85 insertions(+), 2 deletions(-)

>>>

>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c

>>> index 86dcac44f32f..da892737b522 100644

>>> --- a/tools/lib/bpf/bpf.c

>>> +++ b/tools/lib/bpf/bpf.c

>>> @@ -674,7 +674,8 @@ int bpf_link_create(int prog_fd, int target_fd,

>>>    		    enum bpf_attach_type attach_type,

>>>    		    const struct bpf_link_create_opts *opts)

>>>    {

>>> -	__u32 target_btf_id, iter_info_len;

>>> +	__u32 target_btf_id, iter_info_len, multi_btf_ids_cnt;

>>> +	__s32 *multi_btf_ids;

>>>    	union bpf_attr attr;

>>>    	int fd;

>> [...]

>>> @@ -9584,6 +9597,9 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, int *btf_obj_fd,

>>>    	if (!name)

>>>    		return -EINVAL;

>>> +	if (prog->prog_flags & BPF_F_MULTI_FUNC)

>>> +		return 0;

>>> +

>>>    	for (i = 0; i < ARRAY_SIZE(section_defs); i++) {

>>>    		if (!section_defs[i].is_attach_btf)

>>>    			continue;

>>> @@ -10537,6 +10553,62 @@ static struct bpf_link *bpf_program__attach_btf_id(struct bpf_program *prog)

>>>    	return (struct bpf_link *)link;

>>>    }

>>> +static struct bpf_link *bpf_program__attach_multi(struct bpf_program *prog)

>>> +{

>>> +	char *pattern = prog->sec_name + prog->sec_def->len;

>>> +	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);

>>> +	enum bpf_attach_type attach_type;

>>> +	int prog_fd, link_fd, cnt, err;

>>> +	struct bpf_link *link = NULL;

>>> +	__s32 *ids = NULL;

>>> +

>>> +	prog_fd = bpf_program__fd(prog);

>>> +	if (prog_fd < 0) {

>>> +		pr_warn("prog '%s': can't attach before loaded\n", prog->name);

>>> +		return ERR_PTR(-EINVAL);

>>> +	}

>>> +

>>> +	err = bpf_object__load_vmlinux_btf(prog->obj, true);

>>> +	if (err)

>>> +		return ERR_PTR(err);

>>> +

>>> +	cnt = btf__find_by_pattern_kind(prog->obj->btf_vmlinux, pattern,

>>> +					BTF_KIND_FUNC, &ids);

>>> +	if (cnt <= 0)

>>> +		return ERR_PTR(-EINVAL);

>>

>> In kernel, looks like we support cnt = 0, here we error out.

>> Should we also error out in the kernel if cnt == 0?

> 

> hum, I'm not what you mean.. what kernel code are you referring to?


I am referring to the following kernel code:

+static int bpf_tracing_multi_attach(struct bpf_prog *prog,
+				    const union bpf_attr *attr)
+{
+	void __user *ubtf_ids = u64_to_user_ptr(attr->link_create.multi_btf_ids);
+	u32 size, i, cnt = attr->link_create.multi_btf_ids_cnt;
+	struct bpf_tracing_multi_link *link = NULL;
+	struct bpf_link_primer link_primer;
+	struct bpf_trampoline *tr = NULL;
+	int err = -EINVAL;
+	u8 nr_args = 0;
+	u32 *btf_ids;
+
+	if (check_multi_prog_type(prog))
+		return -EINVAL;
+
+	size = cnt * sizeof(*btf_ids);
+	btf_ids = kmalloc(size, GFP_USER | __GFP_NOWARN);
+	if (!btf_ids)
+		return -ENOMEM;
+
+	err = -EFAULT;
+	if (ubtf_ids && copy_from_user(btf_ids, ubtf_ids, size))
+		goto out_free;
+
+	link = kzalloc(sizeof(*link), GFP_USER);
+	if (!link)
+		goto out_free;
+
+	for (i = 0; i < cnt; i++) {
+		struct bpf_attach_target_info tgt_info = {};
+
+		err = bpf_check_attach_target(NULL, prog, NULL, btf_ids[i],
+					      &tgt_info);
+		if (err)
+			goto out_free;
+
+		if (ftrace_set_filter_ip(&link->ops, tgt_info.tgt_addr, 0, 0))
+			goto out_free;
+
+		if (nr_args < tgt_info.fmodel.nr_args)
+			nr_args = tgt_info.fmodel.nr_args;
+	}
+
+	tr = bpf_trampoline_multi_alloc();
+	if (!tr)
+		goto out_free;
+
+	bpf_func_model_nargs(&tr->func.model, nr_args);
+
+	err = bpf_trampoline_link_prog(prog, tr);
+	if (err)
+		goto out_free;
+
+	err = register_ftrace_direct_multi(&link->ops, (unsigned long) 
tr->cur_image->image);
+	if (err)
+		goto out_free;
+
+	bpf_link_init(&link->link, BPF_LINK_TYPE_TRACING_MULTI,
+		      &bpf_tracing_multi_link_lops, prog);
+	link->attach_type = prog->expected_attach_type;
+
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err)
+		goto out_unlink;
+
+	link->tr = tr;
+	/* Take extra ref so we are even with progs added by link_update. */
+	bpf_prog_inc(prog);
+	return bpf_link_settle(&link_primer);
+
+out_unlink:
+	unregister_ftrace_direct_multi(&link->ops);
+out_free:
+	kfree(tr);
+	kfree(btf_ids);
+	kfree(link);
+	return err;
+}
+

Looks like cnt = 0 is okay in bpf_tracing_multi_attach().

> 

> thanks,

> jirka

>

Jiri Olsa June 7, 2021, 8:11 p.m. UTC | #9

On Mon, Jun 07, 2021 at 12:42:51PM -0700, Yonghong Song wrote:

SNIP

> 

> +static int bpf_tracing_multi_attach(struct bpf_prog *prog,

> +				    const union bpf_attr *attr)

> +{

> +	void __user *ubtf_ids = u64_to_user_ptr(attr->link_create.multi_btf_ids);

> +	u32 size, i, cnt = attr->link_create.multi_btf_ids_cnt;

> +	struct bpf_tracing_multi_link *link = NULL;

> +	struct bpf_link_primer link_primer;

> +	struct bpf_trampoline *tr = NULL;

> +	int err = -EINVAL;

> +	u8 nr_args = 0;

> +	u32 *btf_ids;

> +

> +	if (check_multi_prog_type(prog))

> +		return -EINVAL;

> +

> +	size = cnt * sizeof(*btf_ids);

> +	btf_ids = kmalloc(size, GFP_USER | __GFP_NOWARN);

> +	if (!btf_ids)

> +		return -ENOMEM;

> +

> +	err = -EFAULT;

> +	if (ubtf_ids && copy_from_user(btf_ids, ubtf_ids, size))

> +		goto out_free;

> +

> +	link = kzalloc(sizeof(*link), GFP_USER);

> +	if (!link)

> +		goto out_free;

> +

> +	for (i = 0; i < cnt; i++) {

> +		struct bpf_attach_target_info tgt_info = {};

> +

> +		err = bpf_check_attach_target(NULL, prog, NULL, btf_ids[i],

> +					      &tgt_info);

> +		if (err)

> +			goto out_free;

> +

> +		if (ftrace_set_filter_ip(&link->ops, tgt_info.tgt_addr, 0, 0))

> +			goto out_free;

> +

> +		if (nr_args < tgt_info.fmodel.nr_args)

> +			nr_args = tgt_info.fmodel.nr_args;

> +	}

> +

> +	tr = bpf_trampoline_multi_alloc();

> +	if (!tr)

> +		goto out_free;

> +

> +	bpf_func_model_nargs(&tr->func.model, nr_args);

> +

> +	err = bpf_trampoline_link_prog(prog, tr);

> +	if (err)

> +		goto out_free;

> +

> +	err = register_ftrace_direct_multi(&link->ops, (unsigned long)

> tr->cur_image->image);

> +	if (err)

> +		goto out_free;

> +

> +	bpf_link_init(&link->link, BPF_LINK_TYPE_TRACING_MULTI,

> +		      &bpf_tracing_multi_link_lops, prog);

> +	link->attach_type = prog->expected_attach_type;

> +

> +	err = bpf_link_prime(&link->link, &link_primer);

> +	if (err)

> +		goto out_unlink;

> +

> +	link->tr = tr;

> +	/* Take extra ref so we are even with progs added by link_update. */

> +	bpf_prog_inc(prog);

> +	return bpf_link_settle(&link_primer);

> +

> +out_unlink:

> +	unregister_ftrace_direct_multi(&link->ops);

> +out_free:

> +	kfree(tr);

> +	kfree(btf_ids);

> +	kfree(link);

> +	return err;

> +}

> +

> 

> Looks like cnt = 0 is okay in bpf_tracing_multi_attach().


right, we should fail for that with EINVAL, also the
maximum with prog->aux->attach_btf->nr_types at least

thanks,
jirka

Alexei Starovoitov June 8, 2021, 3:42 p.m. UTC | #10

On Sat, Jun 5, 2021 at 4:11 AM Jiri Olsa <jolsa@kernel.org> wrote:
>

> Adding support to attach multiple functions to tracing program

> by using the link_create/link_update interface.

>

> Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> API, that define array of functions btf ids that will be attached

> to prog_fd.

>

> The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

>

> The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> link type, which creates separate bpf_trampoline and registers it

> as direct function for all specified btf ids.

>

> The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> standard trampolines, so all registered functions need to be free

> of direct functions, otherwise the link fails.


Overall the api makes sense to me.
The restriction of multi vs non-multi is too severe though.
The multi trampoline can serve normal fentry/fexit too.
If ip is moved to the end (instead of start) the trampoline
will be able to call into multi and normal fentry/fexit progs. Right?

Jiri Olsa June 8, 2021, 6:17 p.m. UTC | #11

On Tue, Jun 08, 2021 at 08:42:32AM -0700, Alexei Starovoitov wrote:
> On Sat, Jun 5, 2021 at 4:11 AM Jiri Olsa <jolsa@kernel.org> wrote:

> >

> > Adding support to attach multiple functions to tracing program

> > by using the link_create/link_update interface.

> >

> > Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> > API, that define array of functions btf ids that will be attached

> > to prog_fd.

> >

> > The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

> >

> > The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> > link type, which creates separate bpf_trampoline and registers it

> > as direct function for all specified btf ids.

> >

> > The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> > standard trampolines, so all registered functions need to be free

> > of direct functions, otherwise the link fails.

> 

> Overall the api makes sense to me.

> The restriction of multi vs non-multi is too severe though.

> The multi trampoline can serve normal fentry/fexit too.


so multi trampoline gets called from all the registered functions,
so there would need to be filter for specific ip before calling the
standard program.. single cmp/jnz might not be that bad, I'll check

> If ip is moved to the end (instead of start) the trampoline

> will be able to call into multi and normal fentry/fexit progs. Right?

> 


we could just skip ip arg when generating entry for normal programs 
and start from first argument address for %rdi

and it'd need to be transparent for current trampolines user API,
so I wonder there will be some hiccup ;-) let's see

thanks,
jirka

Alexei Starovoitov June 8, 2021, 6:49 p.m. UTC | #12

On Tue, Jun 08, 2021 at 08:17:00PM +0200, Jiri Olsa wrote:
> On Tue, Jun 08, 2021 at 08:42:32AM -0700, Alexei Starovoitov wrote:

> > On Sat, Jun 5, 2021 at 4:11 AM Jiri Olsa <jolsa@kernel.org> wrote:

> > >

> > > Adding support to attach multiple functions to tracing program

> > > by using the link_create/link_update interface.

> > >

> > > Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> > > API, that define array of functions btf ids that will be attached

> > > to prog_fd.

> > >

> > > The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

> > >

> > > The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> > > link type, which creates separate bpf_trampoline and registers it

> > > as direct function for all specified btf ids.

> > >

> > > The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> > > standard trampolines, so all registered functions need to be free

> > > of direct functions, otherwise the link fails.

> > 

> > Overall the api makes sense to me.

> > The restriction of multi vs non-multi is too severe though.

> > The multi trampoline can serve normal fentry/fexit too.

> 

> so multi trampoline gets called from all the registered functions,

> so there would need to be filter for specific ip before calling the

> standard program.. single cmp/jnz might not be that bad, I'll check


You mean reusing the same multi trampoline for all IPs and regenerating
it with a bunch of cmp/jnz checks? There should be a better way to scale.
Maybe clone multi trampoline instead?
IPs[1-10] will point to multi.
IP[11] will point to a clone of multi that serves multi prog and
fentry/fexit progs specific for that IP.

Jiri Olsa June 8, 2021, 9:07 p.m. UTC | #13

On Tue, Jun 08, 2021 at 11:49:03AM -0700, Alexei Starovoitov wrote:
> On Tue, Jun 08, 2021 at 08:17:00PM +0200, Jiri Olsa wrote:

> > On Tue, Jun 08, 2021 at 08:42:32AM -0700, Alexei Starovoitov wrote:

> > > On Sat, Jun 5, 2021 at 4:11 AM Jiri Olsa <jolsa@kernel.org> wrote:

> > > >

> > > > Adding support to attach multiple functions to tracing program

> > > > by using the link_create/link_update interface.

> > > >

> > > > Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> > > > API, that define array of functions btf ids that will be attached

> > > > to prog_fd.

> > > >

> > > > The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

> > > >

> > > > The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> > > > link type, which creates separate bpf_trampoline and registers it

> > > > as direct function for all specified btf ids.

> > > >

> > > > The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> > > > standard trampolines, so all registered functions need to be free

> > > > of direct functions, otherwise the link fails.

> > > 

> > > Overall the api makes sense to me.

> > > The restriction of multi vs non-multi is too severe though.

> > > The multi trampoline can serve normal fentry/fexit too.

> > 

> > so multi trampoline gets called from all the registered functions,

> > so there would need to be filter for specific ip before calling the

> > standard program.. single cmp/jnz might not be that bad, I'll check

> 

> You mean reusing the same multi trampoline for all IPs and regenerating

> it with a bunch of cmp/jnz checks? There should be a better way to scale.

> Maybe clone multi trampoline instead?

> IPs[1-10] will point to multi.

> IP[11] will point to a clone of multi that serves multi prog and

> fentry/fexit progs specific for that IP.


ok, so we'd clone multi trampoline if there's request to attach
standard trampoline to some IP from multi trampoline

.. and transform currently attached standard trampoline for IP
into clone of multi trampoline, if there's request to create
multi trampoline that covers that IP

jirka

Alexei Starovoitov June 8, 2021, 11:05 p.m. UTC | #14

On Tue, Jun 8, 2021 at 2:07 PM Jiri Olsa <jolsa@redhat.com> wrote:
>

> On Tue, Jun 08, 2021 at 11:49:03AM -0700, Alexei Starovoitov wrote:

> > On Tue, Jun 08, 2021 at 08:17:00PM +0200, Jiri Olsa wrote:

> > > On Tue, Jun 08, 2021 at 08:42:32AM -0700, Alexei Starovoitov wrote:

> > > > On Sat, Jun 5, 2021 at 4:11 AM Jiri Olsa <jolsa@kernel.org> wrote:

> > > > >

> > > > > Adding support to attach multiple functions to tracing program

> > > > > by using the link_create/link_update interface.

> > > > >

> > > > > Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> > > > > API, that define array of functions btf ids that will be attached

> > > > > to prog_fd.

> > > > >

> > > > > The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

> > > > >

> > > > > The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> > > > > link type, which creates separate bpf_trampoline and registers it

> > > > > as direct function for all specified btf ids.

> > > > >

> > > > > The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> > > > > standard trampolines, so all registered functions need to be free

> > > > > of direct functions, otherwise the link fails.

> > > >

> > > > Overall the api makes sense to me.

> > > > The restriction of multi vs non-multi is too severe though.

> > > > The multi trampoline can serve normal fentry/fexit too.

> > >

> > > so multi trampoline gets called from all the registered functions,

> > > so there would need to be filter for specific ip before calling the

> > > standard program.. single cmp/jnz might not be that bad, I'll check

> >

> > You mean reusing the same multi trampoline for all IPs and regenerating

> > it with a bunch of cmp/jnz checks? There should be a better way to scale.

> > Maybe clone multi trampoline instead?

> > IPs[1-10] will point to multi.

> > IP[11] will point to a clone of multi that serves multi prog and

> > fentry/fexit progs specific for that IP.

>

> ok, so we'd clone multi trampoline if there's request to attach

> standard trampoline to some IP from multi trampoline

>

> .. and transform currently attached standard trampoline for IP

> into clone of multi trampoline, if there's request to create

> multi trampoline that covers that IP


yep. For every IP==btf_id there will be only two possible trampolines.
Should be easy enough to track and transition between them.
The standard fentry/fexit will only get negligible slowdown from
going through multi.
multi+fexit and fmod_ret needs to be thought through as well.
That's why I thought that 'ip' at the end should simplify things.
Only multi will have access to it.
But we can store it first too. fentry/fexit will see ctx=r1 with +8 offset
and will have normal args in ctx. Like ip isn't even there.
While multi trampoline is always doing ip, arg1,arg2, .., arg6
and passes ctx = &ip into multi prog and ctx = &arg1 into fentry/fexit.
'ret' for fexit is problematic though. hmm.
Maybe such clone multi trampoline for specific ip with 2 args will do:
ip, arg1, arg2, ret, 0, 0, 0, ret.
Then multi will have 6 args, though 3rd is actually ret.
Then fexit will have ret in the right place and multi prog will have
it as 7th arg.

Andrii Nakryiko June 9, 2021, 5:08 a.m. UTC | #15

On Tue, Jun 8, 2021 at 4:07 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Tue, Jun 8, 2021 at 2:07 PM Jiri Olsa <jolsa@redhat.com> wrote:

> >

> > On Tue, Jun 08, 2021 at 11:49:03AM -0700, Alexei Starovoitov wrote:

> > > On Tue, Jun 08, 2021 at 08:17:00PM +0200, Jiri Olsa wrote:

> > > > On Tue, Jun 08, 2021 at 08:42:32AM -0700, Alexei Starovoitov wrote:

> > > > > On Sat, Jun 5, 2021 at 4:11 AM Jiri Olsa <jolsa@kernel.org> wrote:

> > > > > >

> > > > > > Adding support to attach multiple functions to tracing program

> > > > > > by using the link_create/link_update interface.

> > > > > >

> > > > > > Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> > > > > > API, that define array of functions btf ids that will be attached

> > > > > > to prog_fd.

> > > > > >

> > > > > > The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

> > > > > >

> > > > > > The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> > > > > > link type, which creates separate bpf_trampoline and registers it

> > > > > > as direct function for all specified btf ids.

> > > > > >

> > > > > > The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> > > > > > standard trampolines, so all registered functions need to be free

> > > > > > of direct functions, otherwise the link fails.

> > > > >

> > > > > Overall the api makes sense to me.

> > > > > The restriction of multi vs non-multi is too severe though.

> > > > > The multi trampoline can serve normal fentry/fexit too.

> > > >

> > > > so multi trampoline gets called from all the registered functions,

> > > > so there would need to be filter for specific ip before calling the

> > > > standard program.. single cmp/jnz might not be that bad, I'll check

> > >

> > > You mean reusing the same multi trampoline for all IPs and regenerating

> > > it with a bunch of cmp/jnz checks? There should be a better way to scale.

> > > Maybe clone multi trampoline instead?

> > > IPs[1-10] will point to multi.

> > > IP[11] will point to a clone of multi that serves multi prog and

> > > fentry/fexit progs specific for that IP.

> >

> > ok, so we'd clone multi trampoline if there's request to attach

> > standard trampoline to some IP from multi trampoline

> >

> > .. and transform currently attached standard trampoline for IP

> > into clone of multi trampoline, if there's request to create

> > multi trampoline that covers that IP

>

> yep. For every IP==btf_id there will be only two possible trampolines.

> Should be easy enough to track and transition between them.

> The standard fentry/fexit will only get negligible slowdown from

> going through multi.

> multi+fexit and fmod_ret needs to be thought through as well.

> That's why I thought that 'ip' at the end should simplify things.


Putting ip at the end has downsides. We might support >6 arguments
eventually, at which point it will be super weird to have 6 args, ip,
then the rest of arguments?..

Would it be too bad to put IP at -8 offset relative to ctx? That will
also work for normal fentry/fexit, for which it's useful to have ip
passed in as well, IMO. So no special casing for multi/non-multi, and
it's backwards compatible.

Ideally, I'd love it to be actually retrievable through a new BPF
helper, something like bpf_caller_ip(ctx), but I'm not sure if we can
implement this sanely, so I don't hold high hopes.

> Only multi will have access to it.

> But we can store it first too. fentry/fexit will see ctx=r1 with +8 offset

> and will have normal args in ctx. Like ip isn't even there.

> While multi trampoline is always doing ip, arg1,arg2, .., arg6

> and passes ctx = &ip into multi prog and ctx = &arg1 into fentry/fexit.

> 'ret' for fexit is problematic though. hmm.

> Maybe such clone multi trampoline for specific ip with 2 args will do:

> ip, arg1, arg2, ret, 0, 0, 0, ret.

> Then multi will have 6 args, though 3rd is actually ret.

> Then fexit will have ret in the right place and multi prog will have

> it as 7th arg.

Andrii Nakryiko June 9, 2021, 5:18 a.m. UTC | #16

On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:
>

> Adding support to attach multiple functions to tracing program

> by using the link_create/link_update interface.

>

> Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> API, that define array of functions btf ids that will be attached

> to prog_fd.

>

> The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).


So I'm not sure why we added a new load flag instead of just using a
new BPF program type or expected attach type?  We have different
trampolines and different kinds of links for them, so why not be
consistent and use the new type of BPF program?.. It does change BPF
verifier's treatment of input arguments, so it's not just a slight
variation, it's quite different type of program.

>

> The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> link type, which creates separate bpf_trampoline and registers it

> as direct function for all specified btf ids.

>

> The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> standard trampolines, so all registered functions need to be free

> of direct functions, otherwise the link fails.

>

> The new bpf_trampoline will store and pass to bpf program the highest

> number of arguments from all given functions.

>

> New programs (fentry or fexit) can be added to the existing trampoline

> through the link_update interface via new_prog_fd descriptor.

>

> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> ---

>  include/linux/bpf.h            |   3 +

>  include/uapi/linux/bpf.h       |   5 +

>  kernel/bpf/syscall.c           | 185 ++++++++++++++++++++++++++++++++-

>  kernel/bpf/trampoline.c        |  53 +++++++---

>  tools/include/uapi/linux/bpf.h |   5 +

>  5 files changed, 237 insertions(+), 14 deletions(-)

>

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h

> index 23221e0e8d3c..99a81c6c22e6 100644

> --- a/include/linux/bpf.h

> +++ b/include/linux/bpf.h

> @@ -661,6 +661,7 @@ struct bpf_trampoline {

>         struct bpf_tramp_image *cur_image;

>         u64 selector;

>         struct module *mod;

> +       bool multi;

>  };

>

>  struct bpf_attach_target_info {

> @@ -746,6 +747,8 @@ void bpf_ksym_add(struct bpf_ksym *ksym);

>  void bpf_ksym_del(struct bpf_ksym *ksym);

>  int bpf_jit_charge_modmem(u32 pages);

>  void bpf_jit_uncharge_modmem(u32 pages);

> +struct bpf_trampoline *bpf_trampoline_multi_alloc(void);

> +void bpf_trampoline_multi_free(struct bpf_trampoline *tr);

>  #else

>  static inline int bpf_trampoline_link_prog(struct bpf_prog *prog,

>                                            struct bpf_trampoline *tr)

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> index ad9340fb14d4..5fd6ff64e8dc 100644

> --- a/include/uapi/linux/bpf.h

> +++ b/include/uapi/linux/bpf.h

> @@ -1007,6 +1007,7 @@ enum bpf_link_type {

>         BPF_LINK_TYPE_ITER = 4,

>         BPF_LINK_TYPE_NETNS = 5,

>         BPF_LINK_TYPE_XDP = 6,

> +       BPF_LINK_TYPE_TRACING_MULTI = 7,

>

>         MAX_BPF_LINK_TYPE,

>  };

> @@ -1454,6 +1455,10 @@ union bpf_attr {

>                                 __aligned_u64   iter_info;      /* extra bpf_iter_link_info */

>                                 __u32           iter_info_len;  /* iter_info length */

>                         };

> +                       struct {

> +                               __aligned_u64   multi_btf_ids;          /* addresses to attach */

> +                               __u32           multi_btf_ids_cnt;      /* addresses count */

> +                       };


let's do what bpf_link-based TC-BPF API is doing, put it into a named
field (I'd do the same for iter_info/iter_info_len above as well, I'm
not sure why we did this flat naming scheme, we now it's inconvenient
when extending stuff).

struct {
    __aligned_u64 btf_ids;
    __u32 btf_ids_cnt;
} multi;

>                 };

>         } link_create;

>


[...]

> +static int bpf_tracing_multi_link_update(struct bpf_link *link,

> +                                        struct bpf_prog *new_prog,

> +                                        struct bpf_prog *old_prog __maybe_unused)

> +{


BPF_LINK_UPDATE command supports passing old_fd and extra flags. We
can use that to implement both updating existing BPF program in-place
(by passing BPF_F_REPLACE and old_fd) or adding the program to the
list of programs, if old_fd == 0. WDYT?

> +       struct bpf_tracing_multi_link *tr_link =

> +               container_of(link, struct bpf_tracing_multi_link, link);

> +       int err;

> +

> +       if (check_multi_prog_type(new_prog))

> +               return -EINVAL;

> +

> +       err = bpf_trampoline_link_prog(new_prog, tr_link->tr);

> +       if (err)

> +               return err;

> +

> +       err = modify_ftrace_direct_multi(&tr_link->ops,

> +                                        (unsigned long) tr_link->tr->cur_image->image);

> +       return WARN_ON(err);

> +}

> +


[...]

Andrii Nakryiko June 9, 2021, 5:34 a.m. UTC | #17

On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:
>

> Adding support to link multi func tracing program

> through link_create interface.

>

> Adding special types for multi func programs:

>

>   fentry.multi

>   fexit.multi

>

> so you can define multi func programs like:

>

>   SEC("fentry.multi/bpf_fentry_test*")

>   int BPF_PROG(test1, unsigned long ip, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f)

>

> that defines test1 to be attached to bpf_fentry_test* functions,

> and able to attach ip and 6 arguments.

>

> If functions are not specified the program needs to be attached

> manually.

>

> Adding new btf id related fields to bpf_link_create_opts and

> bpf_link_create to use them.

>

> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> ---

>  tools/lib/bpf/bpf.c    | 11 ++++++-

>  tools/lib/bpf/bpf.h    |  4 ++-

>  tools/lib/bpf/libbpf.c | 72 ++++++++++++++++++++++++++++++++++++++++++

>  3 files changed, 85 insertions(+), 2 deletions(-)

>

> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c

> index 86dcac44f32f..da892737b522 100644

> --- a/tools/lib/bpf/bpf.c

> +++ b/tools/lib/bpf/bpf.c

> @@ -674,7 +674,8 @@ int bpf_link_create(int prog_fd, int target_fd,

>                     enum bpf_attach_type attach_type,

>                     const struct bpf_link_create_opts *opts)

>  {

> -       __u32 target_btf_id, iter_info_len;

> +       __u32 target_btf_id, iter_info_len, multi_btf_ids_cnt;

> +       __s32 *multi_btf_ids;

>         union bpf_attr attr;

>         int fd;

>

> @@ -687,6 +688,9 @@ int bpf_link_create(int prog_fd, int target_fd,

>         if (iter_info_len && target_btf_id)


here we check that mutually exclusive options are not specified, we
should do the same for multi stuff

>                 return libbpf_err(-EINVAL);

>

> +       multi_btf_ids = OPTS_GET(opts, multi_btf_ids, 0);

> +       multi_btf_ids_cnt = OPTS_GET(opts, multi_btf_ids_cnt, 0);

> +

>         memset(&attr, 0, sizeof(attr));

>         attr.link_create.prog_fd = prog_fd;

>         attr.link_create.target_fd = target_fd;

> @@ -701,6 +705,11 @@ int bpf_link_create(int prog_fd, int target_fd,

>                 attr.link_create.target_btf_id = target_btf_id;

>         }

>

> +       if (multi_btf_ids && multi_btf_ids_cnt) {

> +               attr.link_create.multi_btf_ids = (__u64) multi_btf_ids;

> +               attr.link_create.multi_btf_ids_cnt = multi_btf_ids_cnt;

> +       }

> +

>         fd = sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));

>         return libbpf_err_errno(fd);

>  }

> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h

> index 4f758f8f50cd..2f78b6c34765 100644

> --- a/tools/lib/bpf/bpf.h

> +++ b/tools/lib/bpf/bpf.h

> @@ -177,8 +177,10 @@ struct bpf_link_create_opts {

>         union bpf_iter_link_info *iter_info;

>         __u32 iter_info_len;

>         __u32 target_btf_id;

> +       __s32 *multi_btf_ids;


why ids are __s32?..

> +       __u32 multi_btf_ids_cnt;

>  };

> -#define bpf_link_create_opts__last_field target_btf_id

> +#define bpf_link_create_opts__last_field multi_btf_ids_cnt

>

>  LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,

>                                enum bpf_attach_type attach_type,

> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c

> index 65f87cc1220c..bd31de3b6a85 100644

> --- a/tools/lib/bpf/libbpf.c

> +++ b/tools/lib/bpf/libbpf.c

> @@ -228,6 +228,7 @@ struct bpf_sec_def {

>         bool is_attachable;

>         bool is_attach_btf;

>         bool is_sleepable;

> +       bool is_multi_func;

>         attach_fn_t attach_fn;

>  };

>

> @@ -7609,6 +7610,8 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,

>

>                 if (prog->sec_def->is_sleepable)

>                         prog->prog_flags |= BPF_F_SLEEPABLE;

> +               if (prog->sec_def->is_multi_func)

> +                       prog->prog_flags |= BPF_F_MULTI_FUNC;

>                 bpf_program__set_type(prog, prog->sec_def->prog_type);

>                 bpf_program__set_expected_attach_type(prog,

>                                 prog->sec_def->expected_attach_type);

> @@ -9070,6 +9073,8 @@ static struct bpf_link *attach_raw_tp(const struct bpf_sec_def *sec,

>                                       struct bpf_program *prog);

>  static struct bpf_link *attach_trace(const struct bpf_sec_def *sec,

>                                      struct bpf_program *prog);

> +static struct bpf_link *attach_trace_multi(const struct bpf_sec_def *sec,

> +                                          struct bpf_program *prog);

>  static struct bpf_link *attach_lsm(const struct bpf_sec_def *sec,

>                                    struct bpf_program *prog);

>  static struct bpf_link *attach_iter(const struct bpf_sec_def *sec,

> @@ -9143,6 +9148,14 @@ static const struct bpf_sec_def section_defs[] = {

>                 .attach_fn = attach_iter),

>         SEC_DEF("syscall", SYSCALL,

>                 .is_sleepable = true),

> +       SEC_DEF("fentry.multi/", TRACING,

> +               .expected_attach_type = BPF_TRACE_FENTRY,


BPF_TRACE_MULTI_FENTRY instead of is_multi stuff everywhere?.. Or a
new type of BPF program altogether?

> +               .is_multi_func = true,

> +               .attach_fn = attach_trace_multi),

> +       SEC_DEF("fexit.multi/", TRACING,

> +               .expected_attach_type = BPF_TRACE_FEXIT,

> +               .is_multi_func = true,

> +               .attach_fn = attach_trace_multi),

>         BPF_EAPROG_SEC("xdp_devmap/",           BPF_PROG_TYPE_XDP,

>                                                 BPF_XDP_DEVMAP),

>         BPF_EAPROG_SEC("xdp_cpumap/",           BPF_PROG_TYPE_XDP,

> @@ -9584,6 +9597,9 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, int *btf_obj_fd,

>         if (!name)

>                 return -EINVAL;

>

> +       if (prog->prog_flags & BPF_F_MULTI_FUNC)

> +               return 0;

> +

>         for (i = 0; i < ARRAY_SIZE(section_defs); i++) {

>                 if (!section_defs[i].is_attach_btf)

>                         continue;

> @@ -10537,6 +10553,62 @@ static struct bpf_link *bpf_program__attach_btf_id(struct bpf_program *prog)

>         return (struct bpf_link *)link;

>  }

>

> +static struct bpf_link *bpf_program__attach_multi(struct bpf_program *prog)

> +{

> +       char *pattern = prog->sec_name + prog->sec_def->len;

> +       DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);

> +       enum bpf_attach_type attach_type;

> +       int prog_fd, link_fd, cnt, err;

> +       struct bpf_link *link = NULL;

> +       __s32 *ids = NULL;

> +

> +       prog_fd = bpf_program__fd(prog);

> +       if (prog_fd < 0) {

> +               pr_warn("prog '%s': can't attach before loaded\n", prog->name);

> +               return ERR_PTR(-EINVAL);

> +       }

> +

> +       err = bpf_object__load_vmlinux_btf(prog->obj, true);

> +       if (err)

> +               return ERR_PTR(err);

> +

> +       cnt = btf__find_by_pattern_kind(prog->obj->btf_vmlinux, pattern,

> +                                       BTF_KIND_FUNC, &ids);


I wonder if it would be better to just support a simplified glob
patterns like "prefix*", "*suffix", "exactmatch", and "*substring*"?
That should be sufficient for majority of cases. For the cases where
user needs something more nuanced, they can just construct BTF ID list
with custom code and do manual attach.

> +       if (cnt <= 0)

> +               return ERR_PTR(-EINVAL);

> +

> +       link = calloc(1, sizeof(*link));

> +       if (!link) {

> +               err = -ENOMEM;

> +               goto out_err;

> +       }

> +       link->detach = &bpf_link__detach_fd;

> +

> +       opts.multi_btf_ids = ids;

> +       opts.multi_btf_ids_cnt = cnt;

> +

> +       attach_type = bpf_program__get_expected_attach_type(prog);

> +       link_fd = bpf_link_create(prog_fd, 0, attach_type, &opts);

> +       if (link_fd < 0) {

> +               err = -errno;

> +               goto out_err;

> +       }

> +       link->fd = link_fd;

> +       free(ids);

> +       return link;

> +

> +out_err:

> +       free(link);

> +       free(ids);

> +       return ERR_PTR(err);

> +}

> +

> +static struct bpf_link *attach_trace_multi(const struct bpf_sec_def *sec,

> +                                          struct bpf_program *prog)

> +{

> +       return bpf_program__attach_multi(prog);

> +}

> +

>  struct bpf_link *bpf_program__attach_trace(struct bpf_program *prog)

>  {

>         return bpf_program__attach_btf_id(prog);

> --

> 2.31.1

>

Andrii Nakryiko June 9, 2021, 5:41 a.m. UTC | #18

On Sat, Jun 5, 2021 at 4:13 AM Jiri Olsa <jolsa@kernel.org> wrote:
>

> Adding selftest for fentry/fexit multi func test that attaches

> to bpf_fentry_test* functions and checks argument values based

> on the processed function.

>

> When multi_arg_check is used from 2 different places I'm getting

> compilation fail, which I did not deciphered yet:

>

>   $ CLANG=/opt/clang/bin/clang LLC=/opt/clang/bin/llc make

>     CLNG-BPF [test_maps] fentry_fexit_multi_test.o

>   progs/fentry_fexit_multi_test.c:18:2: error: too many args to t24: i64 = \

>   GlobalAddress<void (i64, i64, i64, i64, i64, i64, i64, i64*)* @multi_arg_check> 0, \

>   progs/fentry_fexit_multi_test.c:18:2 @[ progs/fentry_fexit_multi_test.c:16:5 ]

>           multi_arg_check(ip, a, b, c, d, e, f, &test1_arg_result);

>           ^

>   progs/fentry_fexit_multi_test.c:25:2: error: too many args to t32: i64 = \

>   GlobalAddress<void (i64, i64, i64, i64, i64, i64, i64, i64*)* @multi_arg_check> 0, \

>   progs/fentry_fexit_multi_test.c:25:2 @[ progs/fentry_fexit_multi_test.c:23:5 ]

>           multi_arg_check(ip, a, b, c, d, e, f, &test2_arg_result);

>           ^

>   In file included from progs/fentry_fexit_multi_test.c:5:

>   /home/jolsa/linux-qemu/tools/testing/selftests/bpf/multi_check.h:9:6: error: defined with too many args

>   void multi_arg_check(unsigned long ip, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f, __u64 *test_result)

>        ^

>   /home/jolsa/linux-qemu/tools/testing/selftests/bpf/multi_check.h:9:6: error: defined with too many args

>   /home/jolsa/linux-qemu/tools/testing/selftests/bpf/multi_check.h:9:6: error: defined with too many args

>   5 errors generated.

>   make: *** [Makefile:470: /home/jolsa/linux-qemu/tools/testing/selftests/bpf/fentry_fexit_multi_test.o] Error 1

>

> I can fix that by defining 2 separate multi_arg_check functions

> with different names, which I did in follow up temporaary patch.

> Not sure I'm hitting some clang/bpf limitation in here?


don't know about  clang limitations, but we should use static linking
proper anyways

>

> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> ---

>  .../bpf/prog_tests/fentry_fexit_multi_test.c  | 52 +++++++++++++++++++

>  .../bpf/progs/fentry_fexit_multi_test.c       | 28 ++++++++++

>  2 files changed, 80 insertions(+)

>  create mode 100644 tools/testing/selftests/bpf/prog_tests/fentry_fexit_multi_test.c

>  create mode 100644 tools/testing/selftests/bpf/progs/fentry_fexit_multi_test.c

>

> diff --git a/tools/testing/selftests/bpf/prog_tests/fentry_fexit_multi_test.c b/tools/testing/selftests/bpf/prog_tests/fentry_fexit_multi_test.c

> new file mode 100644

> index 000000000000..76f917ad843d

> --- /dev/null

> +++ b/tools/testing/selftests/bpf/prog_tests/fentry_fexit_multi_test.c

> @@ -0,0 +1,52 @@

> +// SPDX-License-Identifier: GPL-2.0

> +#include <test_progs.h>

> +#include "fentry_fexit_multi_test.skel.h"

> +

> +void test_fentry_fexit_multi_test(void)

> +{

> +       DECLARE_LIBBPF_OPTS(bpf_link_update_opts, link_upd_opts);

> +       struct fentry_fexit_multi_test *skel = NULL;

> +       unsigned long long *bpf_fentry_test;

> +       __u32 duration = 0, retval;

> +       struct bpf_link *link;

> +       int err, prog_fd;

> +

> +       skel = fentry_fexit_multi_test__open_and_load();

> +       if (!ASSERT_OK_PTR(skel, "fentry_multi_skel_load"))

> +               goto cleanup;

> +

> +       bpf_fentry_test = &skel->bss->bpf_fentry_test[0];

> +       ASSERT_OK(kallsyms_find("bpf_fentry_test1", &bpf_fentry_test[0]), "kallsyms_find");

> +       ASSERT_OK(kallsyms_find("bpf_fentry_test2", &bpf_fentry_test[1]), "kallsyms_find");

> +       ASSERT_OK(kallsyms_find("bpf_fentry_test3", &bpf_fentry_test[2]), "kallsyms_find");

> +       ASSERT_OK(kallsyms_find("bpf_fentry_test4", &bpf_fentry_test[3]), "kallsyms_find");

> +       ASSERT_OK(kallsyms_find("bpf_fentry_test5", &bpf_fentry_test[4]), "kallsyms_find");

> +       ASSERT_OK(kallsyms_find("bpf_fentry_test6", &bpf_fentry_test[5]), "kallsyms_find");

> +       ASSERT_OK(kallsyms_find("bpf_fentry_test7", &bpf_fentry_test[6]), "kallsyms_find");

> +       ASSERT_OK(kallsyms_find("bpf_fentry_test8", &bpf_fentry_test[7]), "kallsyms_find");

> +

> +       link = bpf_program__attach(skel->progs.test1);

> +       if (!ASSERT_OK_PTR(link, "attach_fentry_fexit"))

> +               goto cleanup;

> +

> +       err = bpf_link_update(bpf_link__fd(link),

> +                             bpf_program__fd(skel->progs.test2),

> +                             NULL);

> +       if (!ASSERT_OK(err, "bpf_link_update"))

> +               goto cleanup_link;

> +

> +       prog_fd = bpf_program__fd(skel->progs.test1);

> +       err = bpf_prog_test_run(prog_fd, 1, NULL, 0,

> +                               NULL, NULL, &retval, &duration);

> +       ASSERT_OK(err, "test_run");

> +       ASSERT_EQ(retval, 0, "test_run");

> +

> +       ASSERT_EQ(skel->bss->test1_arg_result, 8, "test1_arg_result");

> +       ASSERT_EQ(skel->bss->test2_arg_result, 8, "test2_arg_result");

> +       ASSERT_EQ(skel->bss->test2_ret_result, 8, "test2_ret_result");

> +

> +cleanup_link:

> +       bpf_link__destroy(link);

> +cleanup:

> +       fentry_fexit_multi_test__destroy(skel);

> +}

> diff --git a/tools/testing/selftests/bpf/progs/fentry_fexit_multi_test.c b/tools/testing/selftests/bpf/progs/fentry_fexit_multi_test.c

> new file mode 100644

> index 000000000000..e25ab0085399

> --- /dev/null

> +++ b/tools/testing/selftests/bpf/progs/fentry_fexit_multi_test.c

> @@ -0,0 +1,28 @@

> +// SPDX-License-Identifier: GPL-2.0

> +#include <linux/bpf.h>

> +#include <bpf/bpf_helpers.h>

> +#include <bpf/bpf_tracing.h>

> +#include "multi_check.h"

> +

> +char _license[] SEC("license") = "GPL";

> +

> +unsigned long long bpf_fentry_test[8];

> +

> +__u64 test1_arg_result = 0;

> +__u64 test2_arg_result = 0;

> +__u64 test2_ret_result = 0;

> +

> +SEC("fentry.multi/bpf_fentry_test*")

> +int BPF_PROG(test1, unsigned long ip, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f)

> +{

> +       multi_arg_check(ip, a, b, c, d, e, f, &test1_arg_result);

> +       return 0;

> +}

> +

> +SEC("fexit.multi/")

> +int BPF_PROG(test2, unsigned long ip, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f, int ret)

> +{

> +       multi_arg_check(ip, a, b, c, d, e, f, &test2_arg_result);

> +       multi_ret_check(ip, ret, &test2_ret_result);

> +       return 0;

> +}

> --

> 2.31.1

>

Jiri Olsa June 9, 2021, 1:33 p.m. UTC | #19

On Tue, Jun 08, 2021 at 04:05:29PM -0700, Alexei Starovoitov wrote:
> On Tue, Jun 8, 2021 at 2:07 PM Jiri Olsa <jolsa@redhat.com> wrote:

> >

> > On Tue, Jun 08, 2021 at 11:49:03AM -0700, Alexei Starovoitov wrote:

> > > On Tue, Jun 08, 2021 at 08:17:00PM +0200, Jiri Olsa wrote:

> > > > On Tue, Jun 08, 2021 at 08:42:32AM -0700, Alexei Starovoitov wrote:

> > > > > On Sat, Jun 5, 2021 at 4:11 AM Jiri Olsa <jolsa@kernel.org> wrote:

> > > > > >

> > > > > > Adding support to attach multiple functions to tracing program

> > > > > > by using the link_create/link_update interface.

> > > > > >

> > > > > > Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> > > > > > API, that define array of functions btf ids that will be attached

> > > > > > to prog_fd.

> > > > > >

> > > > > > The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

> > > > > >

> > > > > > The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> > > > > > link type, which creates separate bpf_trampoline and registers it

> > > > > > as direct function for all specified btf ids.

> > > > > >

> > > > > > The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> > > > > > standard trampolines, so all registered functions need to be free

> > > > > > of direct functions, otherwise the link fails.

> > > > >

> > > > > Overall the api makes sense to me.

> > > > > The restriction of multi vs non-multi is too severe though.

> > > > > The multi trampoline can serve normal fentry/fexit too.

> > > >

> > > > so multi trampoline gets called from all the registered functions,

> > > > so there would need to be filter for specific ip before calling the

> > > > standard program.. single cmp/jnz might not be that bad, I'll check

> > >

> > > You mean reusing the same multi trampoline for all IPs and regenerating

> > > it with a bunch of cmp/jnz checks? There should be a better way to scale.

> > > Maybe clone multi trampoline instead?

> > > IPs[1-10] will point to multi.

> > > IP[11] will point to a clone of multi that serves multi prog and

> > > fentry/fexit progs specific for that IP.

> >

> > ok, so we'd clone multi trampoline if there's request to attach

> > standard trampoline to some IP from multi trampoline

> >

> > .. and transform currently attached standard trampoline for IP

> > into clone of multi trampoline, if there's request to create

> > multi trampoline that covers that IP

> 

> yep. For every IP==btf_id there will be only two possible trampolines.

> Should be easy enough to track and transition between them.

> The standard fentry/fexit will only get negligible slowdown from

> going through multi.

> multi+fexit and fmod_ret needs to be thought through as well.

> That's why I thought that 'ip' at the end should simplify things.

> Only multi will have access to it.

> But we can store it first too. fentry/fexit will see ctx=r1 with +8 offset

> and will have normal args in ctx. Like ip isn't even there.

> While multi trampoline is always doing ip, arg1,arg2, .., arg6

> and passes ctx = &ip into multi prog and ctx = &arg1 into fentry/fexit.

> 'ret' for fexit is problematic though. hmm.

> Maybe such clone multi trampoline for specific ip with 2 args will do:

> ip, arg1, arg2, ret, 0, 0, 0, ret.


we could call multi progs first and setup new args
and call non-multi progs with that

jirka

> Then multi will have 6 args, though 3rd is actually ret.

> Then fexit will have ret in the right place and multi prog will have

> it as 7th arg.

>

Jiri Olsa June 9, 2021, 1:42 p.m. UTC | #20

On Tue, Jun 08, 2021 at 10:08:32PM -0700, Andrii Nakryiko wrote:
> On Tue, Jun 8, 2021 at 4:07 PM Alexei Starovoitov

> <alexei.starovoitov@gmail.com> wrote:

> >

> > On Tue, Jun 8, 2021 at 2:07 PM Jiri Olsa <jolsa@redhat.com> wrote:

> > >

> > > On Tue, Jun 08, 2021 at 11:49:03AM -0700, Alexei Starovoitov wrote:

> > > > On Tue, Jun 08, 2021 at 08:17:00PM +0200, Jiri Olsa wrote:

> > > > > On Tue, Jun 08, 2021 at 08:42:32AM -0700, Alexei Starovoitov wrote:

> > > > > > On Sat, Jun 5, 2021 at 4:11 AM Jiri Olsa <jolsa@kernel.org> wrote:

> > > > > > >

> > > > > > > Adding support to attach multiple functions to tracing program

> > > > > > > by using the link_create/link_update interface.

> > > > > > >

> > > > > > > Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> > > > > > > API, that define array of functions btf ids that will be attached

> > > > > > > to prog_fd.

> > > > > > >

> > > > > > > The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

> > > > > > >

> > > > > > > The new link_create interface creates new BPF_LINK_TYPE_TRACING_MULTI

> > > > > > > link type, which creates separate bpf_trampoline and registers it

> > > > > > > as direct function for all specified btf ids.

> > > > > > >

> > > > > > > The new bpf_trampoline is out of scope (bpf_trampoline_lookup) of

> > > > > > > standard trampolines, so all registered functions need to be free

> > > > > > > of direct functions, otherwise the link fails.

> > > > > >

> > > > > > Overall the api makes sense to me.

> > > > > > The restriction of multi vs non-multi is too severe though.

> > > > > > The multi trampoline can serve normal fentry/fexit too.

> > > > >

> > > > > so multi trampoline gets called from all the registered functions,

> > > > > so there would need to be filter for specific ip before calling the

> > > > > standard program.. single cmp/jnz might not be that bad, I'll check

> > > >

> > > > You mean reusing the same multi trampoline for all IPs and regenerating

> > > > it with a bunch of cmp/jnz checks? There should be a better way to scale.

> > > > Maybe clone multi trampoline instead?

> > > > IPs[1-10] will point to multi.

> > > > IP[11] will point to a clone of multi that serves multi prog and

> > > > fentry/fexit progs specific for that IP.

> > >

> > > ok, so we'd clone multi trampoline if there's request to attach

> > > standard trampoline to some IP from multi trampoline

> > >

> > > .. and transform currently attached standard trampoline for IP

> > > into clone of multi trampoline, if there's request to create

> > > multi trampoline that covers that IP

> >

> > yep. For every IP==btf_id there will be only two possible trampolines.

> > Should be easy enough to track and transition between them.

> > The standard fentry/fexit will only get negligible slowdown from

> > going through multi.

> > multi+fexit and fmod_ret needs to be thought through as well.

> > That's why I thought that 'ip' at the end should simplify things.

> 

> Putting ip at the end has downsides. We might support >6 arguments

> eventually, at which point it will be super weird to have 6 args, ip,

> then the rest of arguments?..

> 

> Would it be too bad to put IP at -8 offset relative to ctx? That will

> also work for normal fentry/fexit, for which it's useful to have ip

> passed in as well, IMO. So no special casing for multi/non-multi, and

> it's backwards compatible.


I think Alexei is ok with that, as he said below

> 

> Ideally, I'd love it to be actually retrievable through a new BPF

> helper, something like bpf_caller_ip(ctx), but I'm not sure if we can

> implement this sanely, so I don't hold high hopes.


we could always store it in ctx-8 and have the helper to get it
from there.. that might also ease up handling that extra first
ip argument for multi-func programs in verifier

jirka

> 

> > Only multi will have access to it.

> > But we can store it first too. fentry/fexit will see ctx=r1 with +8 offset

> > and will have normal args in ctx. Like ip isn't even there.

> > While multi trampoline is always doing ip, arg1,arg2, .., arg6

> > and passes ctx = &ip into multi prog and ctx = &arg1 into fentry/fexit.

> > 'ret' for fexit is problematic though. hmm.

> > Maybe such clone multi trampoline for specific ip with 2 args will do:

> > ip, arg1, arg2, ret, 0, 0, 0, ret.

> > Then multi will have 6 args, though 3rd is actually ret.

> > Then fexit will have ret in the right place and multi prog will have

> > it as 7th arg.

>

Jiri Olsa June 9, 2021, 1:53 p.m. UTC | #21

On Tue, Jun 08, 2021 at 10:18:21PM -0700, Andrii Nakryiko wrote:
> On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:

> >

> > Adding support to attach multiple functions to tracing program

> > by using the link_create/link_update interface.

> >

> > Adding multi_btf_ids/multi_btf_ids_cnt pair to link_create struct

> > API, that define array of functions btf ids that will be attached

> > to prog_fd.

> >

> > The prog_fd needs to be multi prog tracing program (BPF_F_MULTI_FUNC).

> 

> So I'm not sure why we added a new load flag instead of just using a

> new BPF program type or expected attach type?  We have different

> trampolines and different kinds of links for them, so why not be

> consistent and use the new type of BPF program?.. It does change BPF

> verifier's treatment of input arguments, so it's not just a slight

> variation, it's quite different type of program.


ok, makes sense ... BPF_PROG_TYPE_TRACING_MULTI ?

SNIP

> >  struct bpf_attach_target_info {

> > @@ -746,6 +747,8 @@ void bpf_ksym_add(struct bpf_ksym *ksym);

> >  void bpf_ksym_del(struct bpf_ksym *ksym);

> >  int bpf_jit_charge_modmem(u32 pages);

> >  void bpf_jit_uncharge_modmem(u32 pages);

> > +struct bpf_trampoline *bpf_trampoline_multi_alloc(void);

> > +void bpf_trampoline_multi_free(struct bpf_trampoline *tr);

> >  #else

> >  static inline int bpf_trampoline_link_prog(struct bpf_prog *prog,

> >                                            struct bpf_trampoline *tr)

> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> > index ad9340fb14d4..5fd6ff64e8dc 100644

> > --- a/include/uapi/linux/bpf.h

> > +++ b/include/uapi/linux/bpf.h

> > @@ -1007,6 +1007,7 @@ enum bpf_link_type {

> >         BPF_LINK_TYPE_ITER = 4,

> >         BPF_LINK_TYPE_NETNS = 5,

> >         BPF_LINK_TYPE_XDP = 6,

> > +       BPF_LINK_TYPE_TRACING_MULTI = 7,

> >

> >         MAX_BPF_LINK_TYPE,

> >  };

> > @@ -1454,6 +1455,10 @@ union bpf_attr {

> >                                 __aligned_u64   iter_info;      /* extra bpf_iter_link_info */

> >                                 __u32           iter_info_len;  /* iter_info length */

> >                         };

> > +                       struct {

> > +                               __aligned_u64   multi_btf_ids;          /* addresses to attach */

> > +                               __u32           multi_btf_ids_cnt;      /* addresses count */

> > +                       };

> 

> let's do what bpf_link-based TC-BPF API is doing, put it into a named

> field (I'd do the same for iter_info/iter_info_len above as well, I'm

> not sure why we did this flat naming scheme, we now it's inconvenient

> when extending stuff).

> 

> struct {

>     __aligned_u64 btf_ids;

>     __u32 btf_ids_cnt;

> } multi;


ok

> 

> >                 };

> >         } link_create;

> >

> 

> [...]

> 

> > +static int bpf_tracing_multi_link_update(struct bpf_link *link,

> > +                                        struct bpf_prog *new_prog,

> > +                                        struct bpf_prog *old_prog __maybe_unused)

> > +{

> 

> BPF_LINK_UPDATE command supports passing old_fd and extra flags. We

> can use that to implement both updating existing BPF program in-place

> (by passing BPF_F_REPLACE and old_fd) or adding the program to the

> list of programs, if old_fd == 0. WDYT?


yes, sounds good

thanks,
jirka

Jiri Olsa June 9, 2021, 2:17 p.m. UTC | #22

On Tue, Jun 08, 2021 at 10:34:11PM -0700, Andrii Nakryiko wrote:
> On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:

> >

> > Adding support to link multi func tracing program

> > through link_create interface.

> >

> > Adding special types for multi func programs:

> >

> >   fentry.multi

> >   fexit.multi

> >

> > so you can define multi func programs like:

> >

> >   SEC("fentry.multi/bpf_fentry_test*")

> >   int BPF_PROG(test1, unsigned long ip, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f)

> >

> > that defines test1 to be attached to bpf_fentry_test* functions,

> > and able to attach ip and 6 arguments.

> >

> > If functions are not specified the program needs to be attached

> > manually.

> >

> > Adding new btf id related fields to bpf_link_create_opts and

> > bpf_link_create to use them.

> >

> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> > ---

> >  tools/lib/bpf/bpf.c    | 11 ++++++-

> >  tools/lib/bpf/bpf.h    |  4 ++-

> >  tools/lib/bpf/libbpf.c | 72 ++++++++++++++++++++++++++++++++++++++++++

> >  3 files changed, 85 insertions(+), 2 deletions(-)

> >

> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c

> > index 86dcac44f32f..da892737b522 100644

> > --- a/tools/lib/bpf/bpf.c

> > +++ b/tools/lib/bpf/bpf.c

> > @@ -674,7 +674,8 @@ int bpf_link_create(int prog_fd, int target_fd,

> >                     enum bpf_attach_type attach_type,

> >                     const struct bpf_link_create_opts *opts)

> >  {

> > -       __u32 target_btf_id, iter_info_len;

> > +       __u32 target_btf_id, iter_info_len, multi_btf_ids_cnt;

> > +       __s32 *multi_btf_ids;

> >         union bpf_attr attr;

> >         int fd;

> >

> > @@ -687,6 +688,9 @@ int bpf_link_create(int prog_fd, int target_fd,

> >         if (iter_info_len && target_btf_id)

> 

> here we check that mutually exclusive options are not specified, we

> should do the same for multi stuff


right, ok

> 

> >                 return libbpf_err(-EINVAL);

> >

> > +       multi_btf_ids = OPTS_GET(opts, multi_btf_ids, 0);

> > +       multi_btf_ids_cnt = OPTS_GET(opts, multi_btf_ids_cnt, 0);

> > +

> >         memset(&attr, 0, sizeof(attr));

> >         attr.link_create.prog_fd = prog_fd;

> >         attr.link_create.target_fd = target_fd;

> > @@ -701,6 +705,11 @@ int bpf_link_create(int prog_fd, int target_fd,

> >                 attr.link_create.target_btf_id = target_btf_id;

> >         }

> >

> > +       if (multi_btf_ids && multi_btf_ids_cnt) {

> > +               attr.link_create.multi_btf_ids = (__u64) multi_btf_ids;

> > +               attr.link_create.multi_btf_ids_cnt = multi_btf_ids_cnt;

> > +       }

> > +

> >         fd = sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));

> >         return libbpf_err_errno(fd);

> >  }

> > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h

> > index 4f758f8f50cd..2f78b6c34765 100644

> > --- a/tools/lib/bpf/bpf.h

> > +++ b/tools/lib/bpf/bpf.h

> > @@ -177,8 +177,10 @@ struct bpf_link_create_opts {

> >         union bpf_iter_link_info *iter_info;

> >         __u32 iter_info_len;

> >         __u32 target_btf_id;

> > +       __s32 *multi_btf_ids;

> 

> why ids are __s32?..


hum not sure why I did that.. __u32 then

> 

> > +       __u32 multi_btf_ids_cnt;

> >  };

> > -#define bpf_link_create_opts__last_field target_btf_id

> > +#define bpf_link_create_opts__last_field multi_btf_ids_cnt

> >

> >  LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,

> >                                enum bpf_attach_type attach_type,

> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c

> > index 65f87cc1220c..bd31de3b6a85 100644

> > --- a/tools/lib/bpf/libbpf.c

> > +++ b/tools/lib/bpf/libbpf.c

> > @@ -228,6 +228,7 @@ struct bpf_sec_def {

> >         bool is_attachable;

> >         bool is_attach_btf;

> >         bool is_sleepable;

> > +       bool is_multi_func;

> >         attach_fn_t attach_fn;

> >  };

> >

> > @@ -7609,6 +7610,8 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,

> >

> >                 if (prog->sec_def->is_sleepable)

> >                         prog->prog_flags |= BPF_F_SLEEPABLE;

> > +               if (prog->sec_def->is_multi_func)

> > +                       prog->prog_flags |= BPF_F_MULTI_FUNC;

> >                 bpf_program__set_type(prog, prog->sec_def->prog_type);

> >                 bpf_program__set_expected_attach_type(prog,

> >                                 prog->sec_def->expected_attach_type);

> > @@ -9070,6 +9073,8 @@ static struct bpf_link *attach_raw_tp(const struct bpf_sec_def *sec,

> >                                       struct bpf_program *prog);

> >  static struct bpf_link *attach_trace(const struct bpf_sec_def *sec,

> >                                      struct bpf_program *prog);

> > +static struct bpf_link *attach_trace_multi(const struct bpf_sec_def *sec,

> > +                                          struct bpf_program *prog);

> >  static struct bpf_link *attach_lsm(const struct bpf_sec_def *sec,

> >                                    struct bpf_program *prog);

> >  static struct bpf_link *attach_iter(const struct bpf_sec_def *sec,

> > @@ -9143,6 +9148,14 @@ static const struct bpf_sec_def section_defs[] = {

> >                 .attach_fn = attach_iter),

> >         SEC_DEF("syscall", SYSCALL,

> >                 .is_sleepable = true),

> > +       SEC_DEF("fentry.multi/", TRACING,

> > +               .expected_attach_type = BPF_TRACE_FENTRY,

> 

> BPF_TRACE_MULTI_FENTRY instead of is_multi stuff everywhere?.. Or a

> new type of BPF program altogether?

> 

> > +               .is_multi_func = true,

> > +               .attach_fn = attach_trace_multi),

> > +       SEC_DEF("fexit.multi/", TRACING,

> > +               .expected_attach_type = BPF_TRACE_FEXIT,

> > +               .is_multi_func = true,

> > +               .attach_fn = attach_trace_multi),

> >         BPF_EAPROG_SEC("xdp_devmap/",           BPF_PROG_TYPE_XDP,

> >                                                 BPF_XDP_DEVMAP),

> >         BPF_EAPROG_SEC("xdp_cpumap/",           BPF_PROG_TYPE_XDP,

> > @@ -9584,6 +9597,9 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, int *btf_obj_fd,

> >         if (!name)

> >                 return -EINVAL;

> >

> > +       if (prog->prog_flags & BPF_F_MULTI_FUNC)

> > +               return 0;

> > +

> >         for (i = 0; i < ARRAY_SIZE(section_defs); i++) {

> >                 if (!section_defs[i].is_attach_btf)

> >                         continue;

> > @@ -10537,6 +10553,62 @@ static struct bpf_link *bpf_program__attach_btf_id(struct bpf_program *prog)

> >         return (struct bpf_link *)link;

> >  }

> >

> > +static struct bpf_link *bpf_program__attach_multi(struct bpf_program *prog)

> > +{

> > +       char *pattern = prog->sec_name + prog->sec_def->len;

> > +       DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);

> > +       enum bpf_attach_type attach_type;

> > +       int prog_fd, link_fd, cnt, err;

> > +       struct bpf_link *link = NULL;

> > +       __s32 *ids = NULL;

> > +

> > +       prog_fd = bpf_program__fd(prog);

> > +       if (prog_fd < 0) {

> > +               pr_warn("prog '%s': can't attach before loaded\n", prog->name);

> > +               return ERR_PTR(-EINVAL);

> > +       }

> > +

> > +       err = bpf_object__load_vmlinux_btf(prog->obj, true);

> > +       if (err)

> > +               return ERR_PTR(err);

> > +

> > +       cnt = btf__find_by_pattern_kind(prog->obj->btf_vmlinux, pattern,

> > +                                       BTF_KIND_FUNC, &ids);

> 

> I wonder if it would be better to just support a simplified glob

> patterns like "prefix*", "*suffix", "exactmatch", and "*substring*"?

> That should be sufficient for majority of cases. For the cases where

> user needs something more nuanced, they can just construct BTF ID list

> with custom code and do manual attach.


as I wrote earlier the function is just for the purpose of the test,
and we can always do the manual attach

I don't mind adding that simplified matching you described

jirka

> 

> > +       if (cnt <= 0)

> > +               return ERR_PTR(-EINVAL);

> > +

> > +       link = calloc(1, sizeof(*link));

> > +       if (!link) {

> > +               err = -ENOMEM;

> > +               goto out_err;

> > +       }

> > +       link->detach = &bpf_link__detach_fd;

> > +

> > +       opts.multi_btf_ids = ids;

> > +       opts.multi_btf_ids_cnt = cnt;

> > +

> > +       attach_type = bpf_program__get_expected_attach_type(prog);

> > +       link_fd = bpf_link_create(prog_fd, 0, attach_type, &opts);

> > +       if (link_fd < 0) {

> > +               err = -errno;

> > +               goto out_err;

> > +       }

> > +       link->fd = link_fd;

> > +       free(ids);

> > +       return link;

> > +

> > +out_err:

> > +       free(link);

> > +       free(ids);

> > +       return ERR_PTR(err);

> > +}

> > +

> > +static struct bpf_link *attach_trace_multi(const struct bpf_sec_def *sec,

> > +                                          struct bpf_program *prog)

> > +{

> > +       return bpf_program__attach_multi(prog);

> > +}

> > +

> >  struct bpf_link *bpf_program__attach_trace(struct bpf_program *prog)

> >  {

> >         return bpf_program__attach_btf_id(prog);

> > --

> > 2.31.1

> >

>

Jiri Olsa June 9, 2021, 2:29 p.m. UTC | #23

On Tue, Jun 08, 2021 at 10:41:37PM -0700, Andrii Nakryiko wrote:
> On Sat, Jun 5, 2021 at 4:13 AM Jiri Olsa <jolsa@kernel.org> wrote:

> >

> > Adding selftest for fentry/fexit multi func test that attaches

> > to bpf_fentry_test* functions and checks argument values based

> > on the processed function.

> >

> > When multi_arg_check is used from 2 different places I'm getting

> > compilation fail, which I did not deciphered yet:

> >

> >   $ CLANG=/opt/clang/bin/clang LLC=/opt/clang/bin/llc make

> >     CLNG-BPF [test_maps] fentry_fexit_multi_test.o

> >   progs/fentry_fexit_multi_test.c:18:2: error: too many args to t24: i64 = \

> >   GlobalAddress<void (i64, i64, i64, i64, i64, i64, i64, i64*)* @multi_arg_check> 0, \

> >   progs/fentry_fexit_multi_test.c:18:2 @[ progs/fentry_fexit_multi_test.c:16:5 ]

> >           multi_arg_check(ip, a, b, c, d, e, f, &test1_arg_result);

> >           ^

> >   progs/fentry_fexit_multi_test.c:25:2: error: too many args to t32: i64 = \

> >   GlobalAddress<void (i64, i64, i64, i64, i64, i64, i64, i64*)* @multi_arg_check> 0, \

> >   progs/fentry_fexit_multi_test.c:25:2 @[ progs/fentry_fexit_multi_test.c:23:5 ]

> >           multi_arg_check(ip, a, b, c, d, e, f, &test2_arg_result);

> >           ^

> >   In file included from progs/fentry_fexit_multi_test.c:5:

> >   /home/jolsa/linux-qemu/tools/testing/selftests/bpf/multi_check.h:9:6: error: defined with too many args

> >   void multi_arg_check(unsigned long ip, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f, __u64 *test_result)

> >        ^

> >   /home/jolsa/linux-qemu/tools/testing/selftests/bpf/multi_check.h:9:6: error: defined with too many args

> >   /home/jolsa/linux-qemu/tools/testing/selftests/bpf/multi_check.h:9:6: error: defined with too many args

> >   5 errors generated.

> >   make: *** [Makefile:470: /home/jolsa/linux-qemu/tools/testing/selftests/bpf/fentry_fexit_multi_test.o] Error 1

> >

> > I can fix that by defining 2 separate multi_arg_check functions

> > with different names, which I did in follow up temporaary patch.

> > Not sure I'm hitting some clang/bpf limitation in here?

> 

> don't know about  clang limitations, but we should use static linking

> proper anyways


ok, will change

thanks,
jirka

Andrii Nakryiko June 10, 2021, 5:05 p.m. UTC | #24

On Wed, Jun 9, 2021 at 7:17 AM Jiri Olsa <jolsa@redhat.com> wrote:
>

> On Tue, Jun 08, 2021 at 10:34:11PM -0700, Andrii Nakryiko wrote:

> > On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:

> > >

> > > Adding support to link multi func tracing program

> > > through link_create interface.

> > >

> > > Adding special types for multi func programs:

> > >

> > >   fentry.multi

> > >   fexit.multi

> > >

> > > so you can define multi func programs like:

> > >

> > >   SEC("fentry.multi/bpf_fentry_test*")

> > >   int BPF_PROG(test1, unsigned long ip, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f)

> > >

> > > that defines test1 to be attached to bpf_fentry_test* functions,

> > > and able to attach ip and 6 arguments.

> > >

> > > If functions are not specified the program needs to be attached

> > > manually.

> > >

> > > Adding new btf id related fields to bpf_link_create_opts and

> > > bpf_link_create to use them.

> > >

> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> > > ---

> > >  tools/lib/bpf/bpf.c    | 11 ++++++-

> > >  tools/lib/bpf/bpf.h    |  4 ++-

> > >  tools/lib/bpf/libbpf.c | 72 ++++++++++++++++++++++++++++++++++++++++++

> > >  3 files changed, 85 insertions(+), 2 deletions(-)

> > >

> > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c

> > > index 86dcac44f32f..da892737b522 100644

> > > --- a/tools/lib/bpf/bpf.c

> > > +++ b/tools/lib/bpf/bpf.c

> > > @@ -674,7 +674,8 @@ int bpf_link_create(int prog_fd, int target_fd,

> > >                     enum bpf_attach_type attach_type,

> > >                     const struct bpf_link_create_opts *opts)

> > >  {

> > > -       __u32 target_btf_id, iter_info_len;

> > > +       __u32 target_btf_id, iter_info_len, multi_btf_ids_cnt;

> > > +       __s32 *multi_btf_ids;

> > >         union bpf_attr attr;

> > >         int fd;

> > >

> > > @@ -687,6 +688,9 @@ int bpf_link_create(int prog_fd, int target_fd,

> > >         if (iter_info_len && target_btf_id)

> >

> > here we check that mutually exclusive options are not specified, we

> > should do the same for multi stuff

>

> right, ok

>

> >

> > >                 return libbpf_err(-EINVAL);

> > >

> > > +       multi_btf_ids = OPTS_GET(opts, multi_btf_ids, 0);

> > > +       multi_btf_ids_cnt = OPTS_GET(opts, multi_btf_ids_cnt, 0);

> > > +

> > >         memset(&attr, 0, sizeof(attr));

> > >         attr.link_create.prog_fd = prog_fd;

> > >         attr.link_create.target_fd = target_fd;

> > > @@ -701,6 +705,11 @@ int bpf_link_create(int prog_fd, int target_fd,

> > >                 attr.link_create.target_btf_id = target_btf_id;

> > >         }

> > >

> > > +       if (multi_btf_ids && multi_btf_ids_cnt) {

> > > +               attr.link_create.multi_btf_ids = (__u64) multi_btf_ids;

> > > +               attr.link_create.multi_btf_ids_cnt = multi_btf_ids_cnt;

> > > +       }

> > > +

> > >         fd = sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));

> > >         return libbpf_err_errno(fd);

> > >  }

> > > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h

> > > index 4f758f8f50cd..2f78b6c34765 100644

> > > --- a/tools/lib/bpf/bpf.h

> > > +++ b/tools/lib/bpf/bpf.h

> > > @@ -177,8 +177,10 @@ struct bpf_link_create_opts {

> > >         union bpf_iter_link_info *iter_info;

> > >         __u32 iter_info_len;

> > >         __u32 target_btf_id;

> > > +       __s32 *multi_btf_ids;

> >

> > why ids are __s32?..

>

> hum not sure why I did that.. __u32 then

>

> >

> > > +       __u32 multi_btf_ids_cnt;

> > >  };

> > > -#define bpf_link_create_opts__last_field target_btf_id

> > > +#define bpf_link_create_opts__last_field multi_btf_ids_cnt

> > >

> > >  LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,

> > >                                enum bpf_attach_type attach_type,

> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c

> > > index 65f87cc1220c..bd31de3b6a85 100644

> > > --- a/tools/lib/bpf/libbpf.c

> > > +++ b/tools/lib/bpf/libbpf.c

> > > @@ -228,6 +228,7 @@ struct bpf_sec_def {

> > >         bool is_attachable;

> > >         bool is_attach_btf;

> > >         bool is_sleepable;

> > > +       bool is_multi_func;

> > >         attach_fn_t attach_fn;

> > >  };

> > >

> > > @@ -7609,6 +7610,8 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,

> > >

> > >                 if (prog->sec_def->is_sleepable)

> > >                         prog->prog_flags |= BPF_F_SLEEPABLE;

> > > +               if (prog->sec_def->is_multi_func)

> > > +                       prog->prog_flags |= BPF_F_MULTI_FUNC;

> > >                 bpf_program__set_type(prog, prog->sec_def->prog_type);

> > >                 bpf_program__set_expected_attach_type(prog,

> > >                                 prog->sec_def->expected_attach_type);

> > > @@ -9070,6 +9073,8 @@ static struct bpf_link *attach_raw_tp(const struct bpf_sec_def *sec,

> > >                                       struct bpf_program *prog);

> > >  static struct bpf_link *attach_trace(const struct bpf_sec_def *sec,

> > >                                      struct bpf_program *prog);

> > > +static struct bpf_link *attach_trace_multi(const struct bpf_sec_def *sec,

> > > +                                          struct bpf_program *prog);

> > >  static struct bpf_link *attach_lsm(const struct bpf_sec_def *sec,

> > >                                    struct bpf_program *prog);

> > >  static struct bpf_link *attach_iter(const struct bpf_sec_def *sec,

> > > @@ -9143,6 +9148,14 @@ static const struct bpf_sec_def section_defs[] = {

> > >                 .attach_fn = attach_iter),

> > >         SEC_DEF("syscall", SYSCALL,

> > >                 .is_sleepable = true),

> > > +       SEC_DEF("fentry.multi/", TRACING,

> > > +               .expected_attach_type = BPF_TRACE_FENTRY,

> >

> > BPF_TRACE_MULTI_FENTRY instead of is_multi stuff everywhere?.. Or a

> > new type of BPF program altogether?

> >

> > > +               .is_multi_func = true,

> > > +               .attach_fn = attach_trace_multi),

> > > +       SEC_DEF("fexit.multi/", TRACING,

> > > +               .expected_attach_type = BPF_TRACE_FEXIT,

> > > +               .is_multi_func = true,

> > > +               .attach_fn = attach_trace_multi),

> > >         BPF_EAPROG_SEC("xdp_devmap/",           BPF_PROG_TYPE_XDP,

> > >                                                 BPF_XDP_DEVMAP),

> > >         BPF_EAPROG_SEC("xdp_cpumap/",           BPF_PROG_TYPE_XDP,

> > > @@ -9584,6 +9597,9 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, int *btf_obj_fd,

> > >         if (!name)

> > >                 return -EINVAL;

> > >

> > > +       if (prog->prog_flags & BPF_F_MULTI_FUNC)

> > > +               return 0;

> > > +

> > >         for (i = 0; i < ARRAY_SIZE(section_defs); i++) {

> > >                 if (!section_defs[i].is_attach_btf)

> > >                         continue;

> > > @@ -10537,6 +10553,62 @@ static struct bpf_link *bpf_program__attach_btf_id(struct bpf_program *prog)

> > >         return (struct bpf_link *)link;

> > >  }

> > >

> > > +static struct bpf_link *bpf_program__attach_multi(struct bpf_program *prog)

> > > +{

> > > +       char *pattern = prog->sec_name + prog->sec_def->len;

> > > +       DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);

> > > +       enum bpf_attach_type attach_type;

> > > +       int prog_fd, link_fd, cnt, err;

> > > +       struct bpf_link *link = NULL;

> > > +       __s32 *ids = NULL;

> > > +

> > > +       prog_fd = bpf_program__fd(prog);

> > > +       if (prog_fd < 0) {

> > > +               pr_warn("prog '%s': can't attach before loaded\n", prog->name);

> > > +               return ERR_PTR(-EINVAL);

> > > +       }

> > > +

> > > +       err = bpf_object__load_vmlinux_btf(prog->obj, true);

> > > +       if (err)

> > > +               return ERR_PTR(err);

> > > +

> > > +       cnt = btf__find_by_pattern_kind(prog->obj->btf_vmlinux, pattern,

> > > +                                       BTF_KIND_FUNC, &ids);

> >

> > I wonder if it would be better to just support a simplified glob

> > patterns like "prefix*", "*suffix", "exactmatch", and "*substring*"?

> > That should be sufficient for majority of cases. For the cases where

> > user needs something more nuanced, they can just construct BTF ID list

> > with custom code and do manual attach.

>

> as I wrote earlier the function is just for the purpose of the test,

> and we can always do the manual attach

>

> I don't mind adding that simplified matching you described


I use that in retsnoop and that seems to be simple but flexible enough
for all the purposes, so far. It matches typical file globbing rules
(with extra limitations, of course), so it's also intuitive.

But I still am not sure about making it a public API, because in a lot
of cases you'll want a list of patterns (both allowing and denying
different patterns), so it should be generalized to something like

btf__find_by_glob_kind(btf, allow_patterns, deny_patterns, ids)

which gets pretty unwieldy. I'd start with telling users to just
iterate BTF on their own and apply whatever custom filtering they
need. For simple cases libbpf will just initially support a simple and
single glob filter declaratively (e.g, SEC("fentry.multi/bpf_*")).


>

> jirka

>

> >

> > > +       if (cnt <= 0)

> > > +               return ERR_PTR(-EINVAL);

> > > +

> > > +       link = calloc(1, sizeof(*link));

> > > +       if (!link) {

> > > +               err = -ENOMEM;

> > > +               goto out_err;

> > > +       }

> > > +       link->detach = &bpf_link__detach_fd;

> > > +

> > > +       opts.multi_btf_ids = ids;

> > > +       opts.multi_btf_ids_cnt = cnt;

> > > +

> > > +       attach_type = bpf_program__get_expected_attach_type(prog);

> > > +       link_fd = bpf_link_create(prog_fd, 0, attach_type, &opts);

> > > +       if (link_fd < 0) {

> > > +               err = -errno;

> > > +               goto out_err;

> > > +       }

> > > +       link->fd = link_fd;

> > > +       free(ids);

> > > +       return link;

> > > +

> > > +out_err:

> > > +       free(link);

> > > +       free(ids);

> > > +       return ERR_PTR(err);

> > > +}

> > > +

> > > +static struct bpf_link *attach_trace_multi(const struct bpf_sec_def *sec,

> > > +                                          struct bpf_program *prog)

> > > +{

> > > +       return bpf_program__attach_multi(prog);

> > > +}

> > > +

> > >  struct bpf_link *bpf_program__attach_trace(struct bpf_program *prog)

> > >  {

> > >         return bpf_program__attach_btf_id(prog);

> > > --

> > > 2.31.1

> > >

> >

>

Jiri Olsa June 10, 2021, 8:35 p.m. UTC | #25

On Thu, Jun 10, 2021 at 10:05:39AM -0700, Andrii Nakryiko wrote:

SNIP

> > > > +static struct bpf_link *bpf_program__attach_multi(struct bpf_program *prog)

> > > > +{

> > > > +       char *pattern = prog->sec_name + prog->sec_def->len;

> > > > +       DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);

> > > > +       enum bpf_attach_type attach_type;

> > > > +       int prog_fd, link_fd, cnt, err;

> > > > +       struct bpf_link *link = NULL;

> > > > +       __s32 *ids = NULL;

> > > > +

> > > > +       prog_fd = bpf_program__fd(prog);

> > > > +       if (prog_fd < 0) {

> > > > +               pr_warn("prog '%s': can't attach before loaded\n", prog->name);

> > > > +               return ERR_PTR(-EINVAL);

> > > > +       }

> > > > +

> > > > +       err = bpf_object__load_vmlinux_btf(prog->obj, true);

> > > > +       if (err)

> > > > +               return ERR_PTR(err);

> > > > +

> > > > +       cnt = btf__find_by_pattern_kind(prog->obj->btf_vmlinux, pattern,

> > > > +                                       BTF_KIND_FUNC, &ids);

> > >

> > > I wonder if it would be better to just support a simplified glob

> > > patterns like "prefix*", "*suffix", "exactmatch", and "*substring*"?

> > > That should be sufficient for majority of cases. For the cases where

> > > user needs something more nuanced, they can just construct BTF ID list

> > > with custom code and do manual attach.

> >

> > as I wrote earlier the function is just for the purpose of the test,

> > and we can always do the manual attach

> >

> > I don't mind adding that simplified matching you described

> 

> I use that in retsnoop and that seems to be simple but flexible enough

> for all the purposes, so far. It matches typical file globbing rules

> (with extra limitations, of course), so it's also intuitive.

> 

> But I still am not sure about making it a public API, because in a lot

> of cases you'll want a list of patterns (both allowing and denying

> different patterns), so it should be generalized to something like

> 

> btf__find_by_glob_kind(btf, allow_patterns, deny_patterns, ids)

> 

> which gets pretty unwieldy. I'd start with telling users to just

> iterate BTF on their own and apply whatever custom filtering they

> need. For simple cases libbpf will just initially support a simple and

> single glob filter declaratively (e.g, SEC("fentry.multi/bpf_*")).


ok, I'll scan retsnoop and see what I can steal ;-)

jirka

Andrii Nakryiko June 17, 2021, 8:29 p.m. UTC | #26

On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:
>

> hi,

> saga continues.. ;-) previous post is in here [1]

>

> After another discussion with Steven, he mentioned that if we fix

> the ftrace graph problem with direct functions, he'd be open to

> add batch interface for direct ftrace functions.

>

> He already had prove of concept fix for that, which I took and broke

> up into several changes. I added the ftrace direct batch interface

> and bpf new interface on top of that.

>

> It's not so many patches after all, so I thought having them all

> together will help the review, because they are all connected.

> However I can break this up into separate patchsets if necessary.

>

> This patchset contains:

>

>   1) patches (1-4) that fix the ftrace graph tracing over the function

>      with direct trampolines attached

>   2) patches (5-8) that add batch interface for ftrace direct function

>      register/unregister/modify

>   3) patches (9-19) that add support to attach BPF program to multiple

>      functions

>

> In nutshell:

>

> Ad 1) moves the graph tracing setup before the direct trampoline

> prepares the stack, so they don't clash

>

> Ad 2) uses ftrace_ops interface to register direct function with

> all functions in ftrace_ops filter.

>

> Ad 3) creates special program and trampoline type to allow attachment

> of multiple functions to single program.

>

> There're more detailed desriptions in related changelogs.

>

> I have working bpftrace multi attachment code on top this. I briefly

> checked retsnoop and I think it could use the new API as well.

Ok, so I had a bit of time and enthusiasm to try that with retsnoop.
The ugly code is at [0] if you'd like to see what kind of changes I
needed to make to use this (it won't work if you check it out because
it needs your libbpf changes synced into submodule, which I only did
locally). But here are some learnings from that experiment both to
emphasize how important it is to make this work and how restrictive
are some of the current limitations.

First, good news. Using this mass-attach API to attach to almost 1000
kernel functions goes from

Plain fentry/fexit:
===================
real    0m27.321s
user    0m0.352s
sys     0m20.919s

to

Mass-attach fentry/fexit:
=========================
real    0m2.728s
user    0m0.329s
sys     0m2.380s

It's a 10x speed up. And a good chunk of those 2.7 seconds is in some
preparatory steps not related to fentry/fexit stuff.

It's not exactly apples-to-apples, though, because the limitations you
have right now prevents attaching both fentry and fexit programs to
the same set of kernel functions. This makes it pretty useless for a
lot of cases, in particular for retsnoop. So I haven't really tested
retsnoop end-to-end, I only verified that I do see fentries triggered,
but can't have matching fexits. So the speed-up might be smaller due
to additional fexit mass-attach (once that is allowed), but it's still
a massive difference. So we absolutely need to get this optimization
in.

Few more thoughts, if you'd like to plan some more work ahead ;)

1. We need similar mass-attach functionality for kprobe/kretprobe, as
there are use cases where kprobe are more useful than fentry (e.g., >6
args funcs, or funcs with input arguments that are not supported by
BPF verifier, like struct-by-value). It's not clear how to best
represent this, given currently we attach kprobe through perf_event,
but we'll need to think about this for sure.

2. To make mass-attach fentry/fexit useful for practical purposes, it
would be really great to have an ability to fetch traced function's
IP. I.e., if we fentry/fexit func kern_func_abc, bpf_get_func_ip()
would return IP of that functions that matches the one in
/proc/kallsyms. Right now I do very brittle hacks to do that.

So all-in-all, super excited about this, but I hope all those issues
are addressed to make retsnoop possible and fast.

  [0] https://github.com/anakryiko/retsnoop/commit/8a07bc4d8c47d025f755c108f92f0583e3fda6d8

>

>

> Also available at:

>   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git

>   bpf/batch

>

> thanks,

> jirka

>

>

> [1] https://lore.kernel.org/bpf/20210413121516.1467989-1-jolsa@kernel.org/

>

> ---

> Jiri Olsa (17):

>       x86/ftrace: Remove extra orig rax move

>       tracing: Add trampoline/graph selftest

>       ftrace: Add ftrace_add_rec_direct function

>       ftrace: Add multi direct register/unregister interface

>       ftrace: Add multi direct modify interface

>       ftrace/samples: Add multi direct interface test module

>       bpf, x64: Allow to use caller address from stack

>       bpf: Allow to store caller's ip as argument

>       bpf: Add support to load multi func tracing program

>       bpf: Add bpf_trampoline_alloc function

>       bpf: Add support to link multi func tracing program

>       libbpf: Add btf__find_by_pattern_kind function

>       libbpf: Add support to link multi func tracing program

>       selftests/bpf: Add fentry multi func test

>       selftests/bpf: Add fexit multi func test

>       selftests/bpf: Add fentry/fexit multi func test

>       selftests/bpf: Temporary fix for fentry_fexit_multi_test

>

> Steven Rostedt (VMware) (2):

>       x86/ftrace: Remove fault protection code in prepare_ftrace_return

>       x86/ftrace: Make function graph use ftrace directly

>

[...]

Jiri Olsa June 19, 2021, 8:33 a.m. UTC | #27

On Thu, Jun 17, 2021 at 01:29:45PM -0700, Andrii Nakryiko wrote:
> On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:

> >

> > hi,

> > saga continues.. ;-) previous post is in here [1]

> >

> > After another discussion with Steven, he mentioned that if we fix

> > the ftrace graph problem with direct functions, he'd be open to

> > add batch interface for direct ftrace functions.

> >

> > He already had prove of concept fix for that, which I took and broke

> > up into several changes. I added the ftrace direct batch interface

> > and bpf new interface on top of that.

> >

> > It's not so many patches after all, so I thought having them all

> > together will help the review, because they are all connected.

> > However I can break this up into separate patchsets if necessary.

> >

> > This patchset contains:

> >

> >   1) patches (1-4) that fix the ftrace graph tracing over the function

> >      with direct trampolines attached

> >   2) patches (5-8) that add batch interface for ftrace direct function

> >      register/unregister/modify

> >   3) patches (9-19) that add support to attach BPF program to multiple

> >      functions

> >

> > In nutshell:

> >

> > Ad 1) moves the graph tracing setup before the direct trampoline

> > prepares the stack, so they don't clash

> >

> > Ad 2) uses ftrace_ops interface to register direct function with

> > all functions in ftrace_ops filter.

> >

> > Ad 3) creates special program and trampoline type to allow attachment

> > of multiple functions to single program.

> >

> > There're more detailed desriptions in related changelogs.

> >

> > I have working bpftrace multi attachment code on top this. I briefly

> > checked retsnoop and I think it could use the new API as well.

> 

> Ok, so I had a bit of time and enthusiasm to try that with retsnoop.

> The ugly code is at [0] if you'd like to see what kind of changes I

> needed to make to use this (it won't work if you check it out because

> it needs your libbpf changes synced into submodule, which I only did

> locally). But here are some learnings from that experiment both to

> emphasize how important it is to make this work and how restrictive

> are some of the current limitations.

> 

> First, good news. Using this mass-attach API to attach to almost 1000

> kernel functions goes from

> 

> Plain fentry/fexit:

> ===================

> real    0m27.321s

> user    0m0.352s

> sys     0m20.919s

> 

> to

> 

> Mass-attach fentry/fexit:

> =========================

> real    0m2.728s

> user    0m0.329s

> sys     0m2.380s


I did not meassured the bpftrace speedup, because the new code
attached instantly ;-)

> 

> It's a 10x speed up. And a good chunk of those 2.7 seconds is in some

> preparatory steps not related to fentry/fexit stuff.

> 

> It's not exactly apples-to-apples, though, because the limitations you

> have right now prevents attaching both fentry and fexit programs to

> the same set of kernel functions. This makes it pretty useless for a


hum, you could do link_update with fexit program on the link fd,
like in the selftest, right?

> lot of cases, in particular for retsnoop. So I haven't really tested

> retsnoop end-to-end, I only verified that I do see fentries triggered,

> but can't have matching fexits. So the speed-up might be smaller due

> to additional fexit mass-attach (once that is allowed), but it's still

> a massive difference. So we absolutely need to get this optimization

> in.

> 

> Few more thoughts, if you'd like to plan some more work ahead ;)

> 

> 1. We need similar mass-attach functionality for kprobe/kretprobe, as

> there are use cases where kprobe are more useful than fentry (e.g., >6

> args funcs, or funcs with input arguments that are not supported by

> BPF verifier, like struct-by-value). It's not clear how to best

> represent this, given currently we attach kprobe through perf_event,

> but we'll need to think about this for sure.


I'm fighting with the '2 trampolines concept' at the moment, but the
mass attach for kprobes seems interesting ;-) will check

> 

> 2. To make mass-attach fentry/fexit useful for practical purposes, it

> would be really great to have an ability to fetch traced function's

> IP. I.e., if we fentry/fexit func kern_func_abc, bpf_get_func_ip()

> would return IP of that functions that matches the one in

> /proc/kallsyms. Right now I do very brittle hacks to do that.


so I hoped that we could store ip always in ctx-8 and have
the bpf_get_func_ip helper to access that, but the BPF_PROG
macro does not pass ctx value to the program, just args

we could perhaps somehow store the ctx in BPF_PROG before calling
the bpf program, but I did not get to try that yet

> 

> So all-in-all, super excited about this, but I hope all those issues

> are addressed to make retsnoop possible and fast.

> 

>   [0] https://github.com/anakryiko/retsnoop/commit/8a07bc4d8c47d025f755c108f92f0583e3fda6d8


thanks for checking on this,
jirka

Yonghong Song June 19, 2021, 4:19 p.m. UTC | #28

On 6/19/21 1:33 AM, Jiri Olsa wrote:
> On Thu, Jun 17, 2021 at 01:29:45PM -0700, Andrii Nakryiko wrote:

>> On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:

>>>

>>> hi,

>>> saga continues.. ;-) previous post is in here [1]

>>>

>>> After another discussion with Steven, he mentioned that if we fix

>>> the ftrace graph problem with direct functions, he'd be open to

>>> add batch interface for direct ftrace functions.

>>>

>>> He already had prove of concept fix for that, which I took and broke

>>> up into several changes. I added the ftrace direct batch interface

>>> and bpf new interface on top of that.

>>>

>>> It's not so many patches after all, so I thought having them all

>>> together will help the review, because they are all connected.

>>> However I can break this up into separate patchsets if necessary.

>>>

>>> This patchset contains:

>>>

>>>    1) patches (1-4) that fix the ftrace graph tracing over the function

>>>       with direct trampolines attached

>>>    2) patches (5-8) that add batch interface for ftrace direct function

>>>       register/unregister/modify

>>>    3) patches (9-19) that add support to attach BPF program to multiple

>>>       functions

>>>

>>> In nutshell:

>>>

>>> Ad 1) moves the graph tracing setup before the direct trampoline

>>> prepares the stack, so they don't clash

>>>

>>> Ad 2) uses ftrace_ops interface to register direct function with

>>> all functions in ftrace_ops filter.

>>>

>>> Ad 3) creates special program and trampoline type to allow attachment

>>> of multiple functions to single program.

>>>

>>> There're more detailed desriptions in related changelogs.

>>>

>>> I have working bpftrace multi attachment code on top this. I briefly

>>> checked retsnoop and I think it could use the new API as well.

>>

>> Ok, so I had a bit of time and enthusiasm to try that with retsnoop.

>> The ugly code is at [0] if you'd like to see what kind of changes I

>> needed to make to use this (it won't work if you check it out because

>> it needs your libbpf changes synced into submodule, which I only did

>> locally). But here are some learnings from that experiment both to

>> emphasize how important it is to make this work and how restrictive

>> are some of the current limitations.

>>

>> First, good news. Using this mass-attach API to attach to almost 1000

>> kernel functions goes from

>>

>> Plain fentry/fexit:

>> ===================

>> real    0m27.321s

>> user    0m0.352s

>> sys     0m20.919s

>>

>> to

>>

>> Mass-attach fentry/fexit:

>> =========================

>> real    0m2.728s

>> user    0m0.329s

>> sys     0m2.380s

> 

> I did not meassured the bpftrace speedup, because the new code

> attached instantly ;-)

> 

>>

>> It's a 10x speed up. And a good chunk of those 2.7 seconds is in some

>> preparatory steps not related to fentry/fexit stuff.

>>

>> It's not exactly apples-to-apples, though, because the limitations you

>> have right now prevents attaching both fentry and fexit programs to

>> the same set of kernel functions. This makes it pretty useless for a

> 

> hum, you could do link_update with fexit program on the link fd,

> like in the selftest, right?

> 

>> lot of cases, in particular for retsnoop. So I haven't really tested

>> retsnoop end-to-end, I only verified that I do see fentries triggered,

>> but can't have matching fexits. So the speed-up might be smaller due

>> to additional fexit mass-attach (once that is allowed), but it's still

>> a massive difference. So we absolutely need to get this optimization

>> in.

>>

>> Few more thoughts, if you'd like to plan some more work ahead ;)

>>

>> 1. We need similar mass-attach functionality for kprobe/kretprobe, as

>> there are use cases where kprobe are more useful than fentry (e.g., >6

>> args funcs, or funcs with input arguments that are not supported by

>> BPF verifier, like struct-by-value). It's not clear how to best

>> represent this, given currently we attach kprobe through perf_event,

>> but we'll need to think about this for sure.

> 

> I'm fighting with the '2 trampolines concept' at the moment, but the

> mass attach for kprobes seems interesting ;-) will check

> 

>>

>> 2. To make mass-attach fentry/fexit useful for practical purposes, it

>> would be really great to have an ability to fetch traced function's

>> IP. I.e., if we fentry/fexit func kern_func_abc, bpf_get_func_ip()

>> would return IP of that functions that matches the one in

>> /proc/kallsyms. Right now I do very brittle hacks to do that.

> 

> so I hoped that we could store ip always in ctx-8 and have

> the bpf_get_func_ip helper to access that, but the BPF_PROG

> macro does not pass ctx value to the program, just args


ctx does pass to the bpf program. You can check BPF_PROG
macro definition.

> 

> we could perhaps somehow store the ctx in BPF_PROG before calling

> the bpf program, but I did not get to try that yet

> 

>>

>> So all-in-all, super excited about this, but I hope all those issues

>> are addressed to make retsnoop possible and fast.

>>

>>    [0] https://github.com/anakryiko/retsnoop/commit/8a07bc4d8c47d025f755c108f92f0583e3fda6d8

> 

> thanks for checking on this,

> jirka

>

Jiri Olsa June 19, 2021, 5:09 p.m. UTC | #29

On Sat, Jun 19, 2021 at 09:19:57AM -0700, Yonghong Song wrote:
> 

> 

> On 6/19/21 1:33 AM, Jiri Olsa wrote:

> > On Thu, Jun 17, 2021 at 01:29:45PM -0700, Andrii Nakryiko wrote:

> > > On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:

> > > > 

> > > > hi,

> > > > saga continues.. ;-) previous post is in here [1]

> > > > 

> > > > After another discussion with Steven, he mentioned that if we fix

> > > > the ftrace graph problem with direct functions, he'd be open to

> > > > add batch interface for direct ftrace functions.

> > > > 

> > > > He already had prove of concept fix for that, which I took and broke

> > > > up into several changes. I added the ftrace direct batch interface

> > > > and bpf new interface on top of that.

> > > > 

> > > > It's not so many patches after all, so I thought having them all

> > > > together will help the review, because they are all connected.

> > > > However I can break this up into separate patchsets if necessary.

> > > > 

> > > > This patchset contains:

> > > > 

> > > >    1) patches (1-4) that fix the ftrace graph tracing over the function

> > > >       with direct trampolines attached

> > > >    2) patches (5-8) that add batch interface for ftrace direct function

> > > >       register/unregister/modify

> > > >    3) patches (9-19) that add support to attach BPF program to multiple

> > > >       functions

> > > > 

> > > > In nutshell:

> > > > 

> > > > Ad 1) moves the graph tracing setup before the direct trampoline

> > > > prepares the stack, so they don't clash

> > > > 

> > > > Ad 2) uses ftrace_ops interface to register direct function with

> > > > all functions in ftrace_ops filter.

> > > > 

> > > > Ad 3) creates special program and trampoline type to allow attachment

> > > > of multiple functions to single program.

> > > > 

> > > > There're more detailed desriptions in related changelogs.

> > > > 

> > > > I have working bpftrace multi attachment code on top this. I briefly

> > > > checked retsnoop and I think it could use the new API as well.

> > > 

> > > Ok, so I had a bit of time and enthusiasm to try that with retsnoop.

> > > The ugly code is at [0] if you'd like to see what kind of changes I

> > > needed to make to use this (it won't work if you check it out because

> > > it needs your libbpf changes synced into submodule, which I only did

> > > locally). But here are some learnings from that experiment both to

> > > emphasize how important it is to make this work and how restrictive

> > > are some of the current limitations.

> > > 

> > > First, good news. Using this mass-attach API to attach to almost 1000

> > > kernel functions goes from

> > > 

> > > Plain fentry/fexit:

> > > ===================

> > > real    0m27.321s

> > > user    0m0.352s

> > > sys     0m20.919s

> > > 

> > > to

> > > 

> > > Mass-attach fentry/fexit:

> > > =========================

> > > real    0m2.728s

> > > user    0m0.329s

> > > sys     0m2.380s

> > 

> > I did not meassured the bpftrace speedup, because the new code

> > attached instantly ;-)

> > 

> > > 

> > > It's a 10x speed up. And a good chunk of those 2.7 seconds is in some

> > > preparatory steps not related to fentry/fexit stuff.

> > > 

> > > It's not exactly apples-to-apples, though, because the limitations you

> > > have right now prevents attaching both fentry and fexit programs to

> > > the same set of kernel functions. This makes it pretty useless for a

> > 

> > hum, you could do link_update with fexit program on the link fd,

> > like in the selftest, right?

> > 

> > > lot of cases, in particular for retsnoop. So I haven't really tested

> > > retsnoop end-to-end, I only verified that I do see fentries triggered,

> > > but can't have matching fexits. So the speed-up might be smaller due

> > > to additional fexit mass-attach (once that is allowed), but it's still

> > > a massive difference. So we absolutely need to get this optimization

> > > in.

> > > 

> > > Few more thoughts, if you'd like to plan some more work ahead ;)

> > > 

> > > 1. We need similar mass-attach functionality for kprobe/kretprobe, as

> > > there are use cases where kprobe are more useful than fentry (e.g., >6

> > > args funcs, or funcs with input arguments that are not supported by

> > > BPF verifier, like struct-by-value). It's not clear how to best

> > > represent this, given currently we attach kprobe through perf_event,

> > > but we'll need to think about this for sure.

> > 

> > I'm fighting with the '2 trampolines concept' at the moment, but the

> > mass attach for kprobes seems interesting ;-) will check

> > 

> > > 

> > > 2. To make mass-attach fentry/fexit useful for practical purposes, it

> > > would be really great to have an ability to fetch traced function's

> > > IP. I.e., if we fentry/fexit func kern_func_abc, bpf_get_func_ip()

> > > would return IP of that functions that matches the one in

> > > /proc/kallsyms. Right now I do very brittle hacks to do that.

> > 

> > so I hoped that we could store ip always in ctx-8 and have

> > the bpf_get_func_ip helper to access that, but the BPF_PROG

> > macro does not pass ctx value to the program, just args

> 

> ctx does pass to the bpf program. You can check BPF_PROG

> macro definition.


ah right, should have checked it.. so how about we change
trampoline code to store ip in ctx-8 and make bpf_get_func_ip(ctx)
to return [ctx-8]

I'll need to check if it's ok for the tracing helper to take
ctx as argument

thanks,
jirka

Yonghong Song June 20, 2021, 4:56 p.m. UTC | #30

On 6/19/21 10:09 AM, Jiri Olsa wrote:
> On Sat, Jun 19, 2021 at 09:19:57AM -0700, Yonghong Song wrote:

>>

>>

>> On 6/19/21 1:33 AM, Jiri Olsa wrote:

>>> On Thu, Jun 17, 2021 at 01:29:45PM -0700, Andrii Nakryiko wrote:

>>>> On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:

>>>>>

>>>>> hi,

>>>>> saga continues.. ;-) previous post is in here [1]

>>>>>

>>>>> After another discussion with Steven, he mentioned that if we fix

>>>>> the ftrace graph problem with direct functions, he'd be open to

>>>>> add batch interface for direct ftrace functions.

>>>>>

>>>>> He already had prove of concept fix for that, which I took and broke

>>>>> up into several changes. I added the ftrace direct batch interface

>>>>> and bpf new interface on top of that.

>>>>>

>>>>> It's not so many patches after all, so I thought having them all

>>>>> together will help the review, because they are all connected.

>>>>> However I can break this up into separate patchsets if necessary.

>>>>>

>>>>> This patchset contains:

>>>>>

>>>>>     1) patches (1-4) that fix the ftrace graph tracing over the function

>>>>>        with direct trampolines attached

>>>>>     2) patches (5-8) that add batch interface for ftrace direct function

>>>>>        register/unregister/modify

>>>>>     3) patches (9-19) that add support to attach BPF program to multiple

>>>>>        functions

>>>>>

>>>>> In nutshell:

>>>>>

>>>>> Ad 1) moves the graph tracing setup before the direct trampoline

>>>>> prepares the stack, so they don't clash

>>>>>

>>>>> Ad 2) uses ftrace_ops interface to register direct function with

>>>>> all functions in ftrace_ops filter.

>>>>>

>>>>> Ad 3) creates special program and trampoline type to allow attachment

>>>>> of multiple functions to single program.

>>>>>

>>>>> There're more detailed desriptions in related changelogs.

>>>>>

>>>>> I have working bpftrace multi attachment code on top this. I briefly

>>>>> checked retsnoop and I think it could use the new API as well.

>>>>

>>>> Ok, so I had a bit of time and enthusiasm to try that with retsnoop.

>>>> The ugly code is at [0] if you'd like to see what kind of changes I

>>>> needed to make to use this (it won't work if you check it out because

>>>> it needs your libbpf changes synced into submodule, which I only did

>>>> locally). But here are some learnings from that experiment both to

>>>> emphasize how important it is to make this work and how restrictive

>>>> are some of the current limitations.

>>>>

>>>> First, good news. Using this mass-attach API to attach to almost 1000

>>>> kernel functions goes from

>>>>

>>>> Plain fentry/fexit:

>>>> ===================

>>>> real    0m27.321s

>>>> user    0m0.352s

>>>> sys     0m20.919s

>>>>

>>>> to

>>>>

>>>> Mass-attach fentry/fexit:

>>>> =========================

>>>> real    0m2.728s

>>>> user    0m0.329s

>>>> sys     0m2.380s

>>>

>>> I did not meassured the bpftrace speedup, because the new code

>>> attached instantly ;-)

>>>

>>>>

>>>> It's a 10x speed up. And a good chunk of those 2.7 seconds is in some

>>>> preparatory steps not related to fentry/fexit stuff.

>>>>

>>>> It's not exactly apples-to-apples, though, because the limitations you

>>>> have right now prevents attaching both fentry and fexit programs to

>>>> the same set of kernel functions. This makes it pretty useless for a

>>>

>>> hum, you could do link_update with fexit program on the link fd,

>>> like in the selftest, right?

>>>

>>>> lot of cases, in particular for retsnoop. So I haven't really tested

>>>> retsnoop end-to-end, I only verified that I do see fentries triggered,

>>>> but can't have matching fexits. So the speed-up might be smaller due

>>>> to additional fexit mass-attach (once that is allowed), but it's still

>>>> a massive difference. So we absolutely need to get this optimization

>>>> in.

>>>>

>>>> Few more thoughts, if you'd like to plan some more work ahead ;)

>>>>

>>>> 1. We need similar mass-attach functionality for kprobe/kretprobe, as

>>>> there are use cases where kprobe are more useful than fentry (e.g., >6

>>>> args funcs, or funcs with input arguments that are not supported by

>>>> BPF verifier, like struct-by-value). It's not clear how to best

>>>> represent this, given currently we attach kprobe through perf_event,

>>>> but we'll need to think about this for sure.

>>>

>>> I'm fighting with the '2 trampolines concept' at the moment, but the

>>> mass attach for kprobes seems interesting ;-) will check

>>>

>>>>

>>>> 2. To make mass-attach fentry/fexit useful for practical purposes, it

>>>> would be really great to have an ability to fetch traced function's

>>>> IP. I.e., if we fentry/fexit func kern_func_abc, bpf_get_func_ip()

>>>> would return IP of that functions that matches the one in

>>>> /proc/kallsyms. Right now I do very brittle hacks to do that.

>>>

>>> so I hoped that we could store ip always in ctx-8 and have

>>> the bpf_get_func_ip helper to access that, but the BPF_PROG

>>> macro does not pass ctx value to the program, just args

>>

>> ctx does pass to the bpf program. You can check BPF_PROG

>> macro definition.

> 

> ah right, should have checked it.. so how about we change

> trampoline code to store ip in ctx-8 and make bpf_get_func_ip(ctx)

> to return [ctx-8]


This should work. Thanks!

> 

> I'll need to check if it's ok for the tracing helper to take

> ctx as argument

> 

> thanks,

> jirka

>

Alexei Starovoitov June 20, 2021, 5:47 p.m. UTC | #31

On Sun, Jun 20, 2021 at 9:57 AM Yonghong Song <yhs@fb.com> wrote:
> >

> > ah right, should have checked it.. so how about we change

> > trampoline code to store ip in ctx-8 and make bpf_get_func_ip(ctx)

> > to return [ctx-8]

>

> This should work. Thanks!


+1
and pls make it always inline into single LDX insn in the verifier.
For both mass attach and normal fentry/fexit.

Andrii Nakryiko June 21, 2021, 6:46 a.m. UTC | #32

On Sun, Jun 20, 2021 at 8:47 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Sun, Jun 20, 2021 at 9:57 AM Yonghong Song <yhs@fb.com> wrote:

> > >

> > > ah right, should have checked it.. so how about we change

> > > trampoline code to store ip in ctx-8 and make bpf_get_func_ip(ctx)

> > > to return [ctx-8]

> >

> > This should work. Thanks!

>

> +1

> and pls make it always inline into single LDX insn in the verifier.

> For both mass attach and normal fentry/fexit.

Yep.

And we should do it for kprobes (trivial, PT_REGS_IP(ctx)) and
kretprobe (less trivial but simple from inside the kernel, Masami
showed how to do it in one of the previous emails). I hope BPF infra
allows inlining of helpers for some program types but not the others.

Andrii Nakryiko June 21, 2021, 6:50 a.m. UTC | #33

On Sat, Jun 19, 2021 at 11:33 AM Jiri Olsa <jolsa@redhat.com> wrote:
>

> On Thu, Jun 17, 2021 at 01:29:45PM -0700, Andrii Nakryiko wrote:

> > On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:

> > >

> > > hi,

> > > saga continues.. ;-) previous post is in here [1]

> > >

> > > After another discussion with Steven, he mentioned that if we fix

> > > the ftrace graph problem with direct functions, he'd be open to

> > > add batch interface for direct ftrace functions.

> > >

> > > He already had prove of concept fix for that, which I took and broke

> > > up into several changes. I added the ftrace direct batch interface

> > > and bpf new interface on top of that.

> > >

> > > It's not so many patches after all, so I thought having them all

> > > together will help the review, because they are all connected.

> > > However I can break this up into separate patchsets if necessary.

> > >

> > > This patchset contains:

> > >

> > >   1) patches (1-4) that fix the ftrace graph tracing over the function

> > >      with direct trampolines attached

> > >   2) patches (5-8) that add batch interface for ftrace direct function

> > >      register/unregister/modify

> > >   3) patches (9-19) that add support to attach BPF program to multiple

> > >      functions

> > >

> > > In nutshell:

> > >

> > > Ad 1) moves the graph tracing setup before the direct trampoline

> > > prepares the stack, so they don't clash

> > >

> > > Ad 2) uses ftrace_ops interface to register direct function with

> > > all functions in ftrace_ops filter.

> > >

> > > Ad 3) creates special program and trampoline type to allow attachment

> > > of multiple functions to single program.

> > >

> > > There're more detailed desriptions in related changelogs.

> > >

> > > I have working bpftrace multi attachment code on top this. I briefly

> > > checked retsnoop and I think it could use the new API as well.

> >

> > Ok, so I had a bit of time and enthusiasm to try that with retsnoop.

> > The ugly code is at [0] if you'd like to see what kind of changes I

> > needed to make to use this (it won't work if you check it out because

> > it needs your libbpf changes synced into submodule, which I only did

> > locally). But here are some learnings from that experiment both to

> > emphasize how important it is to make this work and how restrictive

> > are some of the current limitations.

> >

> > First, good news. Using this mass-attach API to attach to almost 1000

> > kernel functions goes from

> >

> > Plain fentry/fexit:

> > ===================

> > real    0m27.321s

> > user    0m0.352s

> > sys     0m20.919s

> >

> > to

> >

> > Mass-attach fentry/fexit:

> > =========================

> > real    0m2.728s

> > user    0m0.329s

> > sys     0m2.380s

>

> I did not meassured the bpftrace speedup, because the new code

> attached instantly ;-)

>

> >

> > It's a 10x speed up. And a good chunk of those 2.7 seconds is in some

> > preparatory steps not related to fentry/fexit stuff.

> >

> > It's not exactly apples-to-apples, though, because the limitations you

> > have right now prevents attaching both fentry and fexit programs to

> > the same set of kernel functions. This makes it pretty useless for a

>

> hum, you could do link_update with fexit program on the link fd,

> like in the selftest, right?


Hm... I didn't realize we can attach two different prog FDs to the
same link, honestly (and was too lazy to look through selftests
again). I can try that later. But it's actually quite a
counter-intuitive API (I honestly assumed that link_update can be used
to add more BTF IDs, but not change prog_fd). Previously bpf_link was
always associated with single BPF prog FD. It would be good to keep
that property in the final version, but we can get back to that later.

>

> > lot of cases, in particular for retsnoop. So I haven't really tested

> > retsnoop end-to-end, I only verified that I do see fentries triggered,

> > but can't have matching fexits. So the speed-up might be smaller due

> > to additional fexit mass-attach (once that is allowed), but it's still

> > a massive difference. So we absolutely need to get this optimization

> > in.

> >

> > Few more thoughts, if you'd like to plan some more work ahead ;)

> >

> > 1. We need similar mass-attach functionality for kprobe/kretprobe, as

> > there are use cases where kprobe are more useful than fentry (e.g., >6

> > args funcs, or funcs with input arguments that are not supported by

> > BPF verifier, like struct-by-value). It's not clear how to best

> > represent this, given currently we attach kprobe through perf_event,

> > but we'll need to think about this for sure.

>

> I'm fighting with the '2 trampolines concept' at the moment, but the

> mass attach for kprobes seems interesting ;-) will check

>

> >

> > 2. To make mass-attach fentry/fexit useful for practical purposes, it

> > would be really great to have an ability to fetch traced function's

> > IP. I.e., if we fentry/fexit func kern_func_abc, bpf_get_func_ip()

> > would return IP of that functions that matches the one in

> > /proc/kallsyms. Right now I do very brittle hacks to do that.

>

> so I hoped that we could store ip always in ctx-8 and have

> the bpf_get_func_ip helper to access that, but the BPF_PROG

> macro does not pass ctx value to the program, just args

>

> we could perhaps somehow store the ctx in BPF_PROG before calling

> the bpf program, but I did not get to try that yet

>

> >

> > So all-in-all, super excited about this, but I hope all those issues

> > are addressed to make retsnoop possible and fast.

> >

> >   [0] https://github.com/anakryiko/retsnoop/commit/8a07bc4d8c47d025f755c108f92f0583e3fda6d8

>

> thanks for checking on this,

> jirka

>

Andrii Nakryiko July 6, 2021, 8:26 p.m. UTC | #34

On Sun, Jun 20, 2021 at 11:50 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>

> On Sat, Jun 19, 2021 at 11:33 AM Jiri Olsa <jolsa@redhat.com> wrote:

> >

> > On Thu, Jun 17, 2021 at 01:29:45PM -0700, Andrii Nakryiko wrote:

> > > On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@kernel.org> wrote:

> > > >

> > > > hi,

> > > > saga continues.. ;-) previous post is in here [1]

> > > >

> > > > After another discussion with Steven, he mentioned that if we fix

> > > > the ftrace graph problem with direct functions, he'd be open to

> > > > add batch interface for direct ftrace functions.

> > > >

> > > > He already had prove of concept fix for that, which I took and broke

> > > > up into several changes. I added the ftrace direct batch interface

> > > > and bpf new interface on top of that.

> > > >

> > > > It's not so many patches after all, so I thought having them all

> > > > together will help the review, because they are all connected.

> > > > However I can break this up into separate patchsets if necessary.

> > > >

> > > > This patchset contains:

> > > >

> > > >   1) patches (1-4) that fix the ftrace graph tracing over the function

> > > >      with direct trampolines attached

> > > >   2) patches (5-8) that add batch interface for ftrace direct function

> > > >      register/unregister/modify

> > > >   3) patches (9-19) that add support to attach BPF program to multiple

> > > >      functions

> > > >

> > > > In nutshell:

> > > >

> > > > Ad 1) moves the graph tracing setup before the direct trampoline

> > > > prepares the stack, so they don't clash

> > > >

> > > > Ad 2) uses ftrace_ops interface to register direct function with

> > > > all functions in ftrace_ops filter.

> > > >

> > > > Ad 3) creates special program and trampoline type to allow attachment

> > > > of multiple functions to single program.

> > > >

> > > > There're more detailed desriptions in related changelogs.

> > > >

> > > > I have working bpftrace multi attachment code on top this. I briefly

> > > > checked retsnoop and I think it could use the new API as well.

> > >

> > > Ok, so I had a bit of time and enthusiasm to try that with retsnoop.

> > > The ugly code is at [0] if you'd like to see what kind of changes I

> > > needed to make to use this (it won't work if you check it out because

> > > it needs your libbpf changes synced into submodule, which I only did

> > > locally). But here are some learnings from that experiment both to

> > > emphasize how important it is to make this work and how restrictive

> > > are some of the current limitations.

> > >

> > > First, good news. Using this mass-attach API to attach to almost 1000

> > > kernel functions goes from

> > >

> > > Plain fentry/fexit:

> > > ===================

> > > real    0m27.321s

> > > user    0m0.352s

> > > sys     0m20.919s

> > >

> > > to

> > >

> > > Mass-attach fentry/fexit:

> > > =========================

> > > real    0m2.728s

> > > user    0m0.329s

> > > sys     0m2.380s

> >

> > I did not meassured the bpftrace speedup, because the new code

> > attached instantly ;-)

> >

> > >

> > > It's a 10x speed up. And a good chunk of those 2.7 seconds is in some

> > > preparatory steps not related to fentry/fexit stuff.

> > >

> > > It's not exactly apples-to-apples, though, because the limitations you

> > > have right now prevents attaching both fentry and fexit programs to

> > > the same set of kernel functions. This makes it pretty useless for a

> >

> > hum, you could do link_update with fexit program on the link fd,

> > like in the selftest, right?

>

> Hm... I didn't realize we can attach two different prog FDs to the

> same link, honestly (and was too lazy to look through selftests

> again). I can try that later. But it's actually quite a

> counter-intuitive API (I honestly assumed that link_update can be used

> to add more BTF IDs, but not change prog_fd). Previously bpf_link was

> always associated with single BPF prog FD. It would be good to keep

> that property in the final version, but we can get back to that later.


Ok, I'm back from PTO and as a warm-up did a two-line change to make
retsnoop work end-to-end using this bpf_link_update() approach. See
[0]. I still think it's a completely confusing API to do
bpf_link_update() to have both fexit and fentry, but it worked for
this experiment.

BTW, adding ~900 fexit attachments is barely noticeable, which is
great, means that attachment is instantaneous.

real    0m2.739s
user    0m0.351s
sys     0m2.370s

  [0] https://github.com/anakryiko/retsnoop/commit/c915d729d6e98f83601e432e61cb1bdf476ceefb

>

> >

> > > lot of cases, in particular for retsnoop. So I haven't really tested

> > > retsnoop end-to-end, I only verified that I do see fentries triggered,

> > > but can't have matching fexits. So the speed-up might be smaller due

> > > to additional fexit mass-attach (once that is allowed), but it's still

> > > a massive difference. So we absolutely need to get this optimization

> > > in.

> > >

> > > Few more thoughts, if you'd like to plan some more work ahead ;)

> > >

> > > 1. We need similar mass-attach functionality for kprobe/kretprobe, as

> > > there are use cases where kprobe are more useful than fentry (e.g., >6

> > > args funcs, or funcs with input arguments that are not supported by

> > > BPF verifier, like struct-by-value). It's not clear how to best

> > > represent this, given currently we attach kprobe through perf_event,

> > > but we'll need to think about this for sure.

> >

> > I'm fighting with the '2 trampolines concept' at the moment, but the

> > mass attach for kprobes seems interesting ;-) will check

> >

> > >

> > > 2. To make mass-attach fentry/fexit useful for practical purposes, it

> > > would be really great to have an ability to fetch traced function's

> > > IP. I.e., if we fentry/fexit func kern_func_abc, bpf_get_func_ip()

> > > would return IP of that functions that matches the one in

> > > /proc/kallsyms. Right now I do very brittle hacks to do that.

> >

> > so I hoped that we could store ip always in ctx-8 and have

> > the bpf_get_func_ip helper to access that, but the BPF_PROG

> > macro does not pass ctx value to the program, just args

> >

> > we could perhaps somehow store the ctx in BPF_PROG before calling

> > the bpf program, but I did not get to try that yet

> >

> > >

> > > So all-in-all, super excited about this, but I hope all those issues

> > > are addressed to make retsnoop possible and fast.

> > >

> > >   [0] https://github.com/anakryiko/retsnoop/commit/8a07bc4d8c47d025f755c108f92f0583e3fda6d8

> >

> > thanks for checking on this,

> > jirka

> >

Jiri Olsa July 7, 2021, 3:19 p.m. UTC | #35

On Tue, Jul 06, 2021 at 01:26:46PM -0700, Andrii Nakryiko wrote:

SNIP

> > > >

> > > > It's a 10x speed up. And a good chunk of those 2.7 seconds is in some

> > > > preparatory steps not related to fentry/fexit stuff.

> > > >

> > > > It's not exactly apples-to-apples, though, because the limitations you

> > > > have right now prevents attaching both fentry and fexit programs to

> > > > the same set of kernel functions. This makes it pretty useless for a

> > >

> > > hum, you could do link_update with fexit program on the link fd,

> > > like in the selftest, right?

> >

> > Hm... I didn't realize we can attach two different prog FDs to the

> > same link, honestly (and was too lazy to look through selftests

> > again). I can try that later. But it's actually quite a

> > counter-intuitive API (I honestly assumed that link_update can be used

> > to add more BTF IDs, but not change prog_fd). Previously bpf_link was

> > always associated with single BPF prog FD. It would be good to keep

> > that property in the final version, but we can get back to that later.

> 

> Ok, I'm back from PTO and as a warm-up did a two-line change to make

> retsnoop work end-to-end using this bpf_link_update() approach. See

> [0]. I still think it's a completely confusing API to do

> bpf_link_update() to have both fexit and fentry, but it worked for

> this experiment.


we need the same set of functions, and we have 'fd' representing
that ;-) but that could hopefully go away with the new approach

> 

> BTW, adding ~900 fexit attachments is barely noticeable, which is

> great, means that attachment is instantaneous.


right I see similar not noticable time in bpftrace as well
thanks for testing that,

jirka

> 

> real    0m2.739s

> user    0m0.351s

> sys     0m2.370s

> 

>   [0] https://github.com/anakryiko/retsnoop/commit/c915d729d6e98f83601e432e61cb1bdf476ceefb

> 


SNIP

[RFCv3,00/19] x86/ftrace/bpf: Add batch support for direct/tracing attach

Message

Comments