diff mbox series

[v2,bpf-next,2/6] libbpf: rename static variables during linking

Message ID 20210423185357.1992756-3-andrii@kernel.org
State New
Headers show
Series [v2,bpf-next,1/6] bpftool: strip const/volatile/restrict modifiers from .bss and .data vars | expand

Commit Message

Andrii Nakryiko April 23, 2021, 6:53 p.m. UTC
Prepend <obj_name>.. prefix to each static variable in BTF info during static
linking. This makes them uniquely named for the sake of BPF skeleton use,
allowing to read/write static BPF variables from user-space. This uniqueness
guarantee depends on each linked file name uniqueness, of course. Double dots
separator was chosen both to be different (but similar) to the separator that
Clang is currently using for static variables defined inside functions as well
as to generate a natural (in libbpf parlance, at least) obj__var naming pattern
in BPF skeleton. Static linker also checks for static variable to already
contain ".." separator and skips the rename to allow multi-pass linking and not
keep making variable name ever increasing, if derived object name is changing on
each pass (as is the case for selftests).

This patch also adds opts to bpf_linker__add_file() API, which currently
allows to override object name for a given file and could be extended with other
per-file options in the future. This is not a breaking change because
bpf_linker__add_file() isn't yet released officially.

This patch also includes fixes to few selftests that are already using static
variables. They have to go in in the same patch to not break selftest build.
Keep in mind, this static variable rename only happens during static linking.
For any existing user of BPF skeleton using static variables nothing changes,
because those use cases are using variable names generated by Clang. Only new
users utilizing static linker might need to adjust BPF skeleton use, which
currently will be always new use cases. So ther is no risk of breakage.

static_linked selftests is modified to also validate conflicting static variable
names are handled correctly both during static linking and in BPF skeleton.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/bpf/bpftool/gen.c                       |   2 +-
 tools/lib/bpf/libbpf.h                        |  12 +-
 tools/lib/bpf/linker.c                        | 121 +++++++++++++++++-
 .../selftests/bpf/prog_tests/skeleton.c       |   8 +-
 .../selftests/bpf/prog_tests/static_linked.c  |   8 +-
 .../selftests/bpf/progs/bpf_iter_test_kern4.c |   4 +-
 .../selftests/bpf/progs/test_check_mtu.c      |   4 +-
 .../selftests/bpf/progs/test_cls_redirect.c   |   4 +-
 .../bpf/progs/test_snprintf_single.c          |   2 +-
 .../selftests/bpf/progs/test_sockmap_listen.c |   4 +-
 .../selftests/bpf/progs/test_static_linked1.c |   6 +-
 .../selftests/bpf/progs/test_static_linked2.c |   4 +-
 12 files changed, 151 insertions(+), 28 deletions(-)

Comments

Yonghong Song April 23, 2021, 8:24 p.m. UTC | #1
On 4/23/21 11:53 AM, Andrii Nakryiko wrote:
> Prepend <obj_name>.. prefix to each static variable in BTF info during static
> linking. This makes them uniquely named for the sake of BPF skeleton use,
> allowing to read/write static BPF variables from user-space. This uniqueness
> guarantee depends on each linked file name uniqueness, of course. Double dots
> separator was chosen both to be different (but similar) to the separator that
> Clang is currently using for static variables defined inside functions as well
> as to generate a natural (in libbpf parlance, at least) obj__var naming pattern
> in BPF skeleton. Static linker also checks for static variable to already
> contain ".." separator and skips the rename to allow multi-pass linking and not
> keep making variable name ever increasing, if derived object name is changing on
> each pass (as is the case for selftests).
> 
> This patch also adds opts to bpf_linker__add_file() API, which currently
> allows to override object name for a given file and could be extended with other
> per-file options in the future. This is not a breaking change because
> bpf_linker__add_file() isn't yet released officially.
> 
> This patch also includes fixes to few selftests that are already using static
> variables. They have to go in in the same patch to not break selftest build.

"in in" => "in"

> Keep in mind, this static variable rename only happens during static linking.
> For any existing user of BPF skeleton using static variables nothing changes,
> because those use cases are using variable names generated by Clang. Only new
> users utilizing static linker might need to adjust BPF skeleton use, which
> currently will be always new use cases. So ther is no risk of breakage.
> 
> static_linked selftests is modified to also validate conflicting static variable
> names are handled correctly both during static linking and in BPF skeleton.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>   tools/bpf/bpftool/gen.c                       |   2 +-
>   tools/lib/bpf/libbpf.h                        |  12 +-
>   tools/lib/bpf/linker.c                        | 121 +++++++++++++++++-
>   .../selftests/bpf/prog_tests/skeleton.c       |   8 +-
>   .../selftests/bpf/prog_tests/static_linked.c  |   8 +-
>   .../selftests/bpf/progs/bpf_iter_test_kern4.c |   4 +-
>   .../selftests/bpf/progs/test_check_mtu.c      |   4 +-
>   .../selftests/bpf/progs/test_cls_redirect.c   |   4 +-
>   .../bpf/progs/test_snprintf_single.c          |   2 +-
>   .../selftests/bpf/progs/test_sockmap_listen.c |   4 +-
>   .../selftests/bpf/progs/test_static_linked1.c |   6 +-
>   .../selftests/bpf/progs/test_static_linked2.c |   4 +-
>   12 files changed, 151 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
> index 440a2fcb6441..06fee4a2910a 100644
> --- a/tools/bpf/bpftool/gen.c
> +++ b/tools/bpf/bpftool/gen.c
> @@ -638,7 +638,7 @@ static int do_object(int argc, char **argv)
>   	while (argc) {
>   		file = GET_ARG();
>   
> -		err = bpf_linker__add_file(linker, file);
> +		err = bpf_linker__add_file(linker, file, NULL);
>   		if (err) {
>   			p_err("failed to link '%s': %s (%d)", file, strerror(err), err);
>   			goto out;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index bec4e6a6e31d..67505030c8d1 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -768,10 +768,20 @@ struct bpf_linker_opts {
>   };
>   #define bpf_linker_opts__last_field sz
>   
> +struct bpf_linker_file_opts {
> +	/* size of this struct, for forward/backward compatiblity */
> +	size_t sz;
> +	/* object name override, similar to the one in bpf_object_open_opts */
> +	const char *object_name;
> +};
> +#define bpf_linker_file_opts__last_field sz
> +
>   struct bpf_linker;
>   
>   LIBBPF_API struct bpf_linker *bpf_linker__new(const char *filename, struct bpf_linker_opts *opts);
> -LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filename);
> +LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker,
> +				    const char *filename,
> +				    const struct bpf_linker_file_opts *opts);
>   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>   
> diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c
> index 9de084b1c699..adc3aa9ce040 100644
> --- a/tools/lib/bpf/linker.c
> +++ b/tools/lib/bpf/linker.c
> @@ -4,6 +4,8 @@
>    *
>    * Copyright (c) 2021 Facebook
>    */
> +#define _GNU_SOURCE
> +#include <ctype.h>
>   #include <stdbool.h>
>   #include <stddef.h>
>   #include <stdio.h>
> @@ -47,6 +49,16 @@ struct src_sec {
>   	int sec_type_id;
>   };
>   
> +#define MAX_OBJ_NAME_LEN 64
> +
> +/* According to C standard, only first 63 characters of C identifiers are
> + * guaranteed to be significant. So for transformed static variables of the most
> + * verbose form ('<obj_name>..<func_name>.<var_name>') we need to reserve extra
> + * 64 (function name and dot) + 63 (variable name) + 2 (for .. separator)
> + * characters.
> + */
> +#define MAX_VAR_NAME_LEN (MAX_OBJ_NAME_LEN + 2 + 63 + 1 + 63)
> +
>   struct src_obj {
>   	const char *filename;
>   	int fd;
> @@ -67,6 +79,10 @@ struct src_obj {
>   	int *sym_map;
>   	/* mapping from the src BTF type IDs to dst ones */
>   	int *btf_type_map;
> +
> +	/* BPF object name used for static variable prefixing */
> +	char obj_name[MAX_OBJ_NAME_LEN];
> +	size_t obj_name_len;
>   };
>   
>   /* single .BTF.ext data section */
> @@ -158,7 +174,9 @@ struct bpf_linker {
>   
>   static int init_output_elf(struct bpf_linker *linker, const char *file);
>   
> -static int linker_load_obj_file(struct bpf_linker *linker, const char *filename, struct src_obj *obj);
> +static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
> +				const struct bpf_linker_file_opts *opts,
> +				struct src_obj *obj);
>   static int linker_sanity_check_elf(struct src_obj *obj);
>   static int linker_sanity_check_elf_symtab(struct src_obj *obj, struct src_sec *sec);
>   static int linker_sanity_check_elf_relos(struct src_obj *obj, struct src_sec *sec);
> @@ -435,15 +453,19 @@ static int init_output_elf(struct bpf_linker *linker, const char *file)
>   	return 0;
>   }
>   
> -int bpf_linker__add_file(struct bpf_linker *linker, const char *filename)
> +int bpf_linker__add_file(struct bpf_linker *linker, const char *filename,
> +			 const struct bpf_linker_file_opts *opts)
>   {
>   	struct src_obj obj = {};
>   	int err = 0;
>   
> +	if (!OPTS_VALID(opts, bpf_linker_file_opts))
> +		return -EINVAL;
> +
>   	if (!linker->elf)
>   		return -EINVAL;
>   
> -	err = err ?: linker_load_obj_file(linker, filename, &obj);
> +	err = err ?: linker_load_obj_file(linker, filename, opts, &obj);
>   	err = err ?: linker_append_sec_data(linker, &obj);
>   	err = err ?: linker_append_elf_syms(linker, &obj);
>   	err = err ?: linker_append_elf_relos(linker, &obj);
> @@ -529,7 +551,49 @@ static struct src_sec *add_src_sec(struct src_obj *obj, const char *sec_name)
>   	return sec;
>   }
>   
> -static int linker_load_obj_file(struct bpf_linker *linker, const char *filename, struct src_obj *obj)
> +static void sanitize_obj_name(char *name)
> +{
> +	int i;
> +
> +	for (i = 0; name[i]; i++) {
> +		if (name[i] == '_')
> +			continue;
> +		if (i == 0 && isalpha(name[i]))
> +			continue;
> +		if (i > 0 && isalnum(name[i]))
> +			continue;
> +
> +		name[i] = '_';
> +	}
> +}
> +
> +static bool str_has_suffix(const char *str, const char *suffix)
> +{
> +	size_t n1 = strlen(str), n2 = strlen(suffix);
> +
> +	if (n1 < n2)
> +		return false;
> +
> +	return strcmp(str + n1 - n2, suffix) == 0;
> +}
> +
> +static void get_obj_name(char *name, const char *file)
> +{
> +	/* Using basename() GNU version which doesn't modify arg. */
> +	strncpy(name, basename(file), MAX_OBJ_NAME_LEN - 1);
> +	name[MAX_OBJ_NAME_LEN - 1] = '\0';
> +
> +	if (str_has_suffix(name, ".bpf.o"))
> +		name[strlen(name) - sizeof(".bpf.o") + 1] = '\0';
> +	else if (str_has_suffix(name, ".o"))
> +		name[strlen(name) - sizeof(".o") + 1] = '\0';
> +
> +	sanitize_obj_name(name);
> +}
> +
> +static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
> +				const struct bpf_linker_file_opts *opts,
> +				struct src_obj *obj)
>   {
>   #if __BYTE_ORDER == __LITTLE_ENDIAN
>   	const int host_endianness = ELFDATA2LSB;
> @@ -549,6 +613,14 @@ static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
>   
>   	obj->filename = filename;
>   
> +	if (OPTS_GET(opts, object_name, NULL)) {
> +		strncpy(obj->obj_name, opts->object_name, MAX_OBJ_NAME_LEN);
> +		obj->obj_name[MAX_OBJ_NAME_LEN - 1] = '\0';

Looks we don't have examples/selftests which actually use this option.
The only place to use bpf_linker__add_file() is bpftool which did not
have option to overwrite the obj file name.

The code looks fine to me though.

> +	} else {
> +		get_obj_name(obj->obj_name, filename);
> +	}
> +	obj->obj_name_len = strlen(obj->obj_name);
> +
>   	obj->fd = open(filename, O_RDONLY);
>   	if (obj->fd < 0) {
>   		err = -errno;
> @@ -2264,6 +2336,47 @@ static int linker_append_btf(struct bpf_linker *linker, struct src_obj *obj)
>   				obj->btf_type_map[i] = glob_sym->btf_id;
>   				continue;
>   			}
> +		} else if (btf_is_var(t) && btf_var(t)->linkage == BTF_VAR_STATIC) {
> +			/* Static variables are renamed to include
> +			 * "<obj_name>.." prefix (note double dots), similarly
> +			 * to how static variables inside functions are named
> +			 * "<func_name>.<var_name>" by compiler. This allows to
> +			 * have  unique identifiers for static variables across
> +			 * all linked object files (assuming unique filenames,
> +			 * of course), which BPF skeleton relies on.
> +			 *
> +			 * So worst case static variable inside the function
> +			 * will have the form "<obj_name>..<func_name>.<var_name"
<var_name  => <var_name>
> +			 * and will get sanitized by BPF skeleton generation
> +			 * logic to a field with <obj_name>__<func_name>_<var_name>
> +			 * name. Typical static variable will have a
> +			 * <obj_name>__<var_name> name, implying arguably nice
> +			 * per-file scoping.
> +			 *
> +			 * If static var name already contains '..', though,
> +			 * don't rename it, because it was already renamed by
> +			 * previous linker passes.
> +			 */
> +			name = btf__str_by_offset(obj->btf, t->name_off);
> +			if (!strstr(name, "..")) {
> +				char new_name[MAX_VAR_NAME_LEN];
> +
> +				memcpy(new_name, obj->obj_name, obj->obj_name_len);
> +				new_name[obj->obj_name_len] = '.';
> +				new_name[obj->obj_name_len + 1] = '.';
> +				new_name[obj->obj_name_len + 2] = '\0';
> +				/* -3 is for '..' separator and terminating '\0' */
> +				strncat(new_name, name, MAX_VAR_NAME_LEN - obj->obj_name_len - 3);
> +
> +				id = btf__add_str(obj->btf, new_name);
> +				if (id < 0)
> +					return id;
> +
> +				/* btf__add_str() might invalidate t, so re-fetch */
> +				t = btf__type_by_id(obj->btf, i);
> +
> +				((struct btf_type *)t)->name_off = id;
> +			}
>   		}
>   
>   		id = btf__add_type(linker->btf, obj->btf, t);
> diff --git a/tools/testing/selftests/bpf/prog_tests/skeleton.c b/tools/testing/selftests/bpf/prog_tests/skeleton.c
> index fe87b77af459..bbade99fa544 100644
> --- a/tools/testing/selftests/bpf/prog_tests/skeleton.c
> +++ b/tools/testing/selftests/bpf/prog_tests/skeleton.c
> @@ -82,10 +82,10 @@ void test_skeleton(void)
>   	CHECK(data->out2 != 2, "res2", "got %lld != exp %d\n", data->out2, 2);
>   	CHECK(bss->out3 != 3, "res3", "got %d != exp %d\n", (int)bss->out3, 3);
>   	CHECK(bss->out4 != 4, "res4", "got %lld != exp %d\n", bss->out4, 4);
> -	CHECK(bss->handler_out5.a != 5, "res5", "got %d != exp %d\n",
> -	      bss->handler_out5.a, 5);
> -	CHECK(bss->handler_out5.b != 6, "res6", "got %lld != exp %d\n",
> -	      bss->handler_out5.b, 6);
> +	CHECK(bss->test_skeleton__handler_out5.a != 5, "res5", "got %d != exp %d\n",
> +	      bss->test_skeleton__handler_out5.a, 5);
> +	CHECK(bss->test_skeleton__handler_out5.b != 6, "res6", "got %lld != exp %d\n",
> +	      bss->test_skeleton__handler_out5.b, 6);
>   	CHECK(bss->out6 != 14, "res7", "got %d != exp %d\n", bss->out6, 14);
>   
>   	CHECK(bss->bpf_syscall != kcfg->CONFIG_BPF_SYSCALL, "ext1",
> diff --git a/tools/testing/selftests/bpf/prog_tests/static_linked.c b/tools/testing/selftests/bpf/prog_tests/static_linked.c
> index 46556976dccc..f16736eab900 100644
> --- a/tools/testing/selftests/bpf/prog_tests/static_linked.c
> +++ b/tools/testing/selftests/bpf/prog_tests/static_linked.c
> @@ -14,12 +14,12 @@ void test_static_linked(void)
>   		return;
>   
>   	skel->rodata->rovar1 = 1;
> -	skel->bss->static_var1 = 2;
> -	skel->bss->static_var11 = 3;
> +	skel->bss->test_static_linked1__static_var = 2;
> +	skel->bss->test_static_linked1__static_var1 = 3;
>   
>   	skel->rodata->rovar2 = 4;
> -	skel->bss->static_var2 = 5;
> -	skel->bss->static_var22 = 6;
> +	skel->bss->test_static_linked2__static_var = 5;
> +	skel->bss->test_static_linked2__static_var2 = 6;
>   
>   	err = test_static_linked__load(skel);
>   	if (!ASSERT_OK(err, "skel_load"))
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c b/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c
> index ee49493dc125..43bf8ec8ae79 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c
> @@ -9,8 +9,8 @@ __u32 map1_id = 0, map2_id = 0;
>   __u32 map1_accessed = 0, map2_accessed = 0;
>   __u64 map1_seqnum = 0, map2_seqnum1 = 0, map2_seqnum2 = 0;
>   
> -static volatile const __u32 print_len;
> -static volatile const __u32 ret1;
> +volatile const __u32 print_len = 0;
> +volatile const __u32 ret1 = 0;

I am little bit puzzled why bpf_iter_test_kern4.c is impacted. I think 
this is not in a static link test, right? The same for a few tests below.

>   
>   SEC("iter/bpf_map")
>   int dump_bpf_map(struct bpf_iter__bpf_map *ctx)
> diff --git a/tools/testing/selftests/bpf/progs/test_check_mtu.c b/tools/testing/selftests/bpf/progs/test_check_mtu.c
> index c4a9bae96e75..71184af57749 100644
> --- a/tools/testing/selftests/bpf/progs/test_check_mtu.c
> +++ b/tools/testing/selftests/bpf/progs/test_check_mtu.c
> @@ -11,8 +11,8 @@
>   char _license[] SEC("license") = "GPL";
>   
>   /* Userspace will update with MTU it can see on device */
> -static volatile const int GLOBAL_USER_MTU;
> -static volatile const __u32 GLOBAL_USER_IFINDEX;
> +volatile const int GLOBAL_USER_MTU;
> +volatile const __u32 GLOBAL_USER_IFINDEX;
>   
>   /* BPF-prog will update these with MTU values it can see */
>   __u32 global_bpf_mtu_xdp = 0;
> diff --git a/tools/testing/selftests/bpf/progs/test_cls_redirect.c b/tools/testing/selftests/bpf/progs/test_cls_redirect.c
> index 3c1e042962e6..e2a5acc4785c 100644
> --- a/tools/testing/selftests/bpf/progs/test_cls_redirect.c
> +++ b/tools/testing/selftests/bpf/progs/test_cls_redirect.c
> @@ -39,8 +39,8 @@ char _license[] SEC("license") = "Dual BSD/GPL";
>   /**
>    * Destination port and IP used for UDP encapsulation.
>    */
> -static volatile const __be16 ENCAPSULATION_PORT;
> -static volatile const __be32 ENCAPSULATION_IP;
> +volatile const __be16 ENCAPSULATION_PORT;
> +volatile const __be32 ENCAPSULATION_IP;
>   
>   typedef struct {
>   	uint64_t processed_packets_total;
> diff --git a/tools/testing/selftests/bpf/progs/test_snprintf_single.c b/tools/testing/selftests/bpf/progs/test_snprintf_single.c
> index 402adaf344f9..6b63ba86b409 100644
> --- a/tools/testing/selftests/bpf/progs/test_snprintf_single.c
> +++ b/tools/testing/selftests/bpf/progs/test_snprintf_single.c
> @@ -5,7 +5,7 @@
>   #include <bpf/bpf_helpers.h>
>   
>   /* The format string is filled from the userspace such that loading fails */
> -static const char fmt[10];
> +const char fmt[10] = {};
>   
>   SEC("raw_tp/sys_enter")
>   int handler(const void *ctx)
> diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
> index a39eba9f5201..a1cc58b10c7c 100644
> --- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
> +++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
> @@ -28,8 +28,8 @@ struct {
>   	__type(value, unsigned int);
>   } verdict_map SEC(".maps");
>   
> -static volatile bool test_sockmap; /* toggled by user-space */
> -static volatile bool test_ingress; /* toggled by user-space */
> +bool test_sockmap = false; /* toggled by user-space */
> +bool test_ingress = false; /* toggled by user-space */
>   
>   SEC("sk_skb/stream_parser")
>   int prog_stream_parser(struct __sk_buff *skb)
[...]
Yonghong Song April 23, 2021, 9:56 p.m. UTC | #2
On 4/23/21 2:38 PM, Andrii Nakryiko wrote:
> On Fri, Apr 23, 2021 at 1:24 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 4/23/21 11:53 AM, Andrii Nakryiko wrote:
>>> Prepend <obj_name>.. prefix to each static variable in BTF info during static
>>> linking. This makes them uniquely named for the sake of BPF skeleton use,
>>> allowing to read/write static BPF variables from user-space. This uniqueness
>>> guarantee depends on each linked file name uniqueness, of course. Double dots
>>> separator was chosen both to be different (but similar) to the separator that
>>> Clang is currently using for static variables defined inside functions as well
>>> as to generate a natural (in libbpf parlance, at least) obj__var naming pattern
>>> in BPF skeleton. Static linker also checks for static variable to already
>>> contain ".." separator and skips the rename to allow multi-pass linking and not
>>> keep making variable name ever increasing, if derived object name is changing on
>>> each pass (as is the case for selftests).
>>>
>>> This patch also adds opts to bpf_linker__add_file() API, which currently
>>> allows to override object name for a given file and could be extended with other
>>> per-file options in the future. This is not a breaking change because
>>> bpf_linker__add_file() isn't yet released officially.
>>>
>>> This patch also includes fixes to few selftests that are already using static
>>> variables. They have to go in in the same patch to not break selftest build.
>>
>> "in in" => "in"
> 
> heh, I knew this would be confusing :) it's "go in" a verb and "in the
> same patch" as where they go into. But I'll re-phrase in the next
> version.
> 
>>
>>> Keep in mind, this static variable rename only happens during static linking.
>>> For any existing user of BPF skeleton using static variables nothing changes,
>>> because those use cases are using variable names generated by Clang. Only new
>>> users utilizing static linker might need to adjust BPF skeleton use, which
>>> currently will be always new use cases. So ther is no risk of breakage.
>>>
>>> static_linked selftests is modified to also validate conflicting static variable
>>> names are handled correctly both during static linking and in BPF skeleton.
>>>
>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>>> ---
>>>    tools/bpf/bpftool/gen.c                       |   2 +-
>>>    tools/lib/bpf/libbpf.h                        |  12 +-
>>>    tools/lib/bpf/linker.c                        | 121 +++++++++++++++++-
>>>    .../selftests/bpf/prog_tests/skeleton.c       |   8 +-
>>>    .../selftests/bpf/prog_tests/static_linked.c  |   8 +-
>>>    .../selftests/bpf/progs/bpf_iter_test_kern4.c |   4 +-
>>>    .../selftests/bpf/progs/test_check_mtu.c      |   4 +-
>>>    .../selftests/bpf/progs/test_cls_redirect.c   |   4 +-
>>>    .../bpf/progs/test_snprintf_single.c          |   2 +-
>>>    .../selftests/bpf/progs/test_sockmap_listen.c |   4 +-
>>>    .../selftests/bpf/progs/test_static_linked1.c |   6 +-
>>>    .../selftests/bpf/progs/test_static_linked2.c |   4 +-
>>>    12 files changed, 151 insertions(+), 28 deletions(-)
>>>
> 
> [...]
> 
>>> +static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
>>> +                             const struct bpf_linker_file_opts *opts,
>>> +                             struct src_obj *obj)
>>>    {
>>>    #if __BYTE_ORDER == __LITTLE_ENDIAN
>>>        const int host_endianness = ELFDATA2LSB;
>>> @@ -549,6 +613,14 @@ static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
>>>
>>>        obj->filename = filename;
>>>
>>> +     if (OPTS_GET(opts, object_name, NULL)) {
>>> +             strncpy(obj->obj_name, opts->object_name, MAX_OBJ_NAME_LEN);
>>> +             obj->obj_name[MAX_OBJ_NAME_LEN - 1] = '\0';
>>
>> Looks we don't have examples/selftests which actually use this option.
>> The only place to use bpf_linker__add_file() is bpftool which did not
>> have option to overwrite the obj file name.
>>
>> The code looks fine to me though.
> 
> Right, I was a bit lazy in adding this to bpftool, but I'm sure we'll
> want to support overriding obj file name, at least to resolve object
> file name conflicts. One other problem is syntax, I haven't thought
> through what's the best way to do this with bpftool. Something like
> this would do it:
> 
> bpftool gen object dest.o input1.o=custom_obj_name
> path/to/file2.o=another_custom_obj_name
> 
> But this is too bike-shedding a topic which I want to avoid for now.

Okay. The additional option thing can be done later.

> 
>>
>>> +     } else {
>>> +             get_obj_name(obj->obj_name, filename);
>>> +     }
>>> +     obj->obj_name_len = strlen(obj->obj_name);
>>> +
>>>        obj->fd = open(filename, O_RDONLY);
>>>        if (obj->fd < 0) {
>>>                err = -errno;
>>> @@ -2264,6 +2336,47 @@ static int linker_append_btf(struct bpf_linker *linker, struct src_obj *obj)
>>>                                obj->btf_type_map[i] = glob_sym->btf_id;
>>>                                continue;
>>>                        }
>>> +             } else if (btf_is_var(t) && btf_var(t)->linkage == BTF_VAR_STATIC) {
>>> +                     /* Static variables are renamed to include
>>> +                      * "<obj_name>.." prefix (note double dots), similarly
>>> +                      * to how static variables inside functions are named
>>> +                      * "<func_name>.<var_name>" by compiler. This allows to
>>> +                      * have  unique identifiers for static variables across
>>> +                      * all linked object files (assuming unique filenames,
>>> +                      * of course), which BPF skeleton relies on.
>>> +                      *
>>> +                      * So worst case static variable inside the function
>>> +                      * will have the form "<obj_name>..<func_name>.<var_name"
>> <var_name  => <var_name>
> 
> good catch, will fix
> 
>>> +                      * and will get sanitized by BPF skeleton generation
>>> +                      * logic to a field with <obj_name>__<func_name>_<var_name>
>>> +                      * name. Typical static variable will have a
>>> +                      * <obj_name>__<var_name> name, implying arguably nice
>>> +                      * per-file scoping.
>>> +                      *
>>> +                      * If static var name already contains '..', though,
>>> +                      * don't rename it, because it was already renamed by
>>> +                      * previous linker passes.
>>> +                      */
>>> +                     name = btf__str_by_offset(obj->btf, t->name_off);
>>> +                     if (!strstr(name, "..")) {
>>> +                             char new_name[MAX_VAR_NAME_LEN];
>>> +
>>> +                             memcpy(new_name, obj->obj_name, obj->obj_name_len);
>>> +                             new_name[obj->obj_name_len] = '.';
>>> +                             new_name[obj->obj_name_len + 1] = '.';
>>> +                             new_name[obj->obj_name_len + 2] = '\0';
>>> +                             /* -3 is for '..' separator and terminating '\0' */
>>> +                             strncat(new_name, name, MAX_VAR_NAME_LEN - obj->obj_name_len - 3);
>>> +
>>> +                             id = btf__add_str(obj->btf, new_name);
>>> +                             if (id < 0)
>>> +                                     return id;
>>> +
>>> +                             /* btf__add_str() might invalidate t, so re-fetch */
>>> +                             t = btf__type_by_id(obj->btf, i);
>>> +
>>> +                             ((struct btf_type *)t)->name_off = id;
>>> +                     }
>>>                }
>>>
> 
> [...]
> 
>>> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c b/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c
>>> index ee49493dc125..43bf8ec8ae79 100644
>>> --- a/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c
>>> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c
>>> @@ -9,8 +9,8 @@ __u32 map1_id = 0, map2_id = 0;
>>>    __u32 map1_accessed = 0, map2_accessed = 0;
>>>    __u64 map1_seqnum = 0, map2_seqnum1 = 0, map2_seqnum2 = 0;
>>>
>>> -static volatile const __u32 print_len;
>>> -static volatile const __u32 ret1;
>>> +volatile const __u32 print_len = 0;
>>> +volatile const __u32 ret1 = 0;
>>
>> I am little bit puzzled why bpf_iter_test_kern4.c is impacted. I think
>> this is not in a static link test, right? The same for a few tests below.
> 
> All the selftests are passed through a static linker, so it will
> append obj_name to each static variable. So I just minimized use of
> static variables to avoid too much code churn. If this variable was
> static, it would have to be accessed as
> skel->rodata->bpf_iter_test_kern4__print_len, for example.

Okay this should be fine. selftests/bpf specific. I just feel that
some people may get confused if they write/see a single program in 
selftest and they have to use obj_varname format and thinking this
is a new standard, but actually it is due to static linking buried
in Makefile. Maybe add a note in selftests/README.rst so we
can point to people if there is confusion.

> 
>>
>>>
>>>    SEC("iter/bpf_map")
>>>    int dump_bpf_map(struct bpf_iter__bpf_map *ctx)
>>> diff --git a/tools/testing/selftests/bpf/progs/test_check_mtu.c b/tools/testing/selftests/bpf/progs/test_check_mtu.c
>>> index c4a9bae96e75..71184af57749 100644
>>> --- a/tools/testing/selftests/bpf/progs/test_check_mtu.c
>>> +++ b/tools/testing/selftests/bpf/progs/test_check_mtu.c
>>> @@ -11,8 +11,8 @@
>>>    char _license[] SEC("license") = "GPL";
>>>
> 
> [...]
>
Alexei Starovoitov April 23, 2021, 11:35 p.m. UTC | #3
On Fri, Apr 23, 2021 at 4:26 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 4/23/21 4:05 PM, Alexei Starovoitov wrote:
> > On Fri, Apr 23, 2021 at 2:56 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>>
> >>>>> -static volatile const __u32 print_len;
> >>>>> -static volatile const __u32 ret1;
> >>>>> +volatile const __u32 print_len = 0;
> >>>>> +volatile const __u32 ret1 = 0;
> >>>>
> >>>> I am little bit puzzled why bpf_iter_test_kern4.c is impacted. I think
> >>>> this is not in a static link test, right? The same for a few tests below.
> >>>
> >>> All the selftests are passed through a static linker, so it will
> >>> append obj_name to each static variable. So I just minimized use of
> >>> static variables to avoid too much code churn. If this variable was
> >>> static, it would have to be accessed as
> >>> skel->rodata->bpf_iter_test_kern4__print_len, for example.
> >>
> >> Okay this should be fine. selftests/bpf specific. I just feel that
> >> some people may get confused if they write/see a single program in
> >> selftest and they have to use obj_varname format and thinking this
> >> is a new standard, but actually it is due to static linking buried
> >> in Makefile. Maybe add a note in selftests/README.rst so we
> >> can point to people if there is confusion.
> >
> > I'm not sure I understand.
> > Are you saying that
> > bpftool gen object out_file.o in_file.o
> > is no longer equivalent to llvm-strip ?
>
> This is more about BTF and ELF.
> Give a simple example,
> $ cat t1.c
> volatile static int aa = 10;
> int foo() { return aa; }
> $ clang -O2 -g -c -target bpf t1.c
>
> Using bpftool compiled with this patch:
> $ bpftool gen object output.o t1.o
> $ llvm-readelf -s t1.o | grep aa
>       3: 0000000000000000     4 OBJECT  LOCAL  DEFAULT     4 aa
> $ llvm-readelf -s output.o | grep aa
>       3: 0000000000000000     4 OBJECT  LOCAL  DEFAULT     4 aa
>
> $ bpftool btf dump file t1.o | grep aa
> [5] VAR 'aa' type_id=4, linkage=static
> $ bpftool btf dump file output.o | grep aa
> [5] VAR 't1..aa' type_id=4, linkage=static

Right. That's how I understood it.

> So yes you are right, this will affect skeleton user
> if you use static linker with single file.
>
> > Since during that step static vars will get their names mangled?
> > So a good chunk of code that uses skeleton right now should either
> > 1. don't do the linking step
> > or
> > 2. adjust their code to use global vars
> > or
> > 3. adjust the usage of skel.h in their corresponding user code
> >    to accommodate mangled static names?
> > Did it get it right?

It feels that the strategy of mangling every static name is too harsh.
Maybe sub-skeleton is an answer?
Something that would do a sub-skeleton for each file?
Mainly to keep "bpftool gen object output.o t1.o" from messing with btf names.
Maybe linking of btf-s from multiple .o should somehow scope them?
I don't have a concrete suggestion yet.
Alexei Starovoitov April 23, 2021, 11:48 p.m. UTC | #4
On Fri, Apr 23, 2021 at 4:35 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Apr 23, 2021 at 4:06 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Apr 23, 2021 at 2:56 PM Yonghong Song <yhs@fb.com> wrote:
> > > >>>
> > > >>> -static volatile const __u32 print_len;
> > > >>> -static volatile const __u32 ret1;
> > > >>> +volatile const __u32 print_len = 0;
> > > >>> +volatile const __u32 ret1 = 0;
> > > >>
> > > >> I am little bit puzzled why bpf_iter_test_kern4.c is impacted. I think
> > > >> this is not in a static link test, right? The same for a few tests below.
> > > >
> > > > All the selftests are passed through a static linker, so it will
> > > > append obj_name to each static variable. So I just minimized use of
> > > > static variables to avoid too much code churn. If this variable was
> > > > static, it would have to be accessed as
> > > > skel->rodata->bpf_iter_test_kern4__print_len, for example.
> > >
> > > Okay this should be fine. selftests/bpf specific. I just feel that
> > > some people may get confused if they write/see a single program in
> > > selftest and they have to use obj_varname format and thinking this
> > > is a new standard, but actually it is due to static linking buried
> > > in Makefile. Maybe add a note in selftests/README.rst so we
> > > can point to people if there is confusion.
> >
> > I'm not sure I understand.
> > Are you saying that
> > bpftool gen object out_file.o in_file.o
> > is no longer equivalent to llvm-strip ?
> > Since during that step static vars will get their names mangled?
>
> Yes. Static vars and static maps. We don't allow (yet?) static
> entry-point BPF programs, so those don't change.
>
> > So a good chunk of code that uses skeleton right now should either
> > 1. don't do the linking step
> > or
> > 2. adjust their code to use global vars
> > or
> > 3. adjust the usage of skel.h in their corresponding user code
> >   to accommodate mangled static names?
> > Did it get it right?
>
> Yes, you are right. But so far most cases outside of selftest that
> I've seen don't use static variables (partially because they need
> pesky volatile to be visible from user-space at all), global vars are
> much nicer in that regard.

Right.
but wait...
why linker is mangling them at all and why they appear in the skeleton?
static vars without volatile should not be in a skeleton, since changing
them from user space might have no meaning on the bpf program.
The behavior of the bpf prog is unpredictable.
Only volatile static can theoretically be in the skeleton, but as you said
probably no one is using them yet, so we can omit them from skeleton too.
Andrii Nakryiko April 24, 2021, 12:13 a.m. UTC | #5
On Fri, Apr 23, 2021 at 4:48 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Fri, Apr 23, 2021 at 4:35 PM Andrii Nakryiko

> <andrii.nakryiko@gmail.com> wrote:

> >

> > On Fri, Apr 23, 2021 at 4:06 PM Alexei Starovoitov

> > <alexei.starovoitov@gmail.com> wrote:

> > >

> > > On Fri, Apr 23, 2021 at 2:56 PM Yonghong Song <yhs@fb.com> wrote:

> > > > >>>

> > > > >>> -static volatile const __u32 print_len;

> > > > >>> -static volatile const __u32 ret1;

> > > > >>> +volatile const __u32 print_len = 0;

> > > > >>> +volatile const __u32 ret1 = 0;

> > > > >>

> > > > >> I am little bit puzzled why bpf_iter_test_kern4.c is impacted. I think

> > > > >> this is not in a static link test, right? The same for a few tests below.

> > > > >

> > > > > All the selftests are passed through a static linker, so it will

> > > > > append obj_name to each static variable. So I just minimized use of

> > > > > static variables to avoid too much code churn. If this variable was

> > > > > static, it would have to be accessed as

> > > > > skel->rodata->bpf_iter_test_kern4__print_len, for example.

> > > >

> > > > Okay this should be fine. selftests/bpf specific. I just feel that

> > > > some people may get confused if they write/see a single program in

> > > > selftest and they have to use obj_varname format and thinking this

> > > > is a new standard, but actually it is due to static linking buried

> > > > in Makefile. Maybe add a note in selftests/README.rst so we

> > > > can point to people if there is confusion.

> > >

> > > I'm not sure I understand.

> > > Are you saying that

> > > bpftool gen object out_file.o in_file.o

> > > is no longer equivalent to llvm-strip ?

> > > Since during that step static vars will get their names mangled?

> >

> > Yes. Static vars and static maps. We don't allow (yet?) static

> > entry-point BPF programs, so those don't change.

> >

> > > So a good chunk of code that uses skeleton right now should either

> > > 1. don't do the linking step

> > > or

> > > 2. adjust their code to use global vars

> > > or

> > > 3. adjust the usage of skel.h in their corresponding user code

> > >   to accommodate mangled static names?

> > > Did it get it right?

> >

> > Yes, you are right. But so far most cases outside of selftest that

> > I've seen don't use static variables (partially because they need

> > pesky volatile to be visible from user-space at all), global vars are

> > much nicer in that regard.

>

> Right.

> but wait...

> why linker is mangling them at all and why they appear in the skeleton?

> static vars without volatile should not be in a skeleton, since changing

> them from user space might have no meaning on the bpf program.

> The behavior of the bpf prog is unpredictable.


It's up to the compiler. If compiler decides that it shouldn't inline
all the uses (or e.g. if static variable is an array accessed with
index known only at runtime, or many other cases where compiler can't
just deduce constant value), then compiler will emit ELF symbols, will
allocate storage, and code will use that storage. static volatile just
forces the compiler to not assume anything at all.

If the compiler does inline all the uses of static, then we won't have
storage allocated for it and it won't be even present in BTF. So for
libbpf, linker and skeleton statics are no different than globals.

Static maps are slightly different, because we use SEC() which marks
them as used, so they should always be present.

Sub-skeleton will present those statics to the BPF library without
name mangling, but for the final linked BPF object file we need to
handle statics. Definitely for maps, because static means that library
or library user shouldn't be able to just extern that definition and
update/lookup/corrupt its state. But I think for static variables it
should be the same. Both are visible to user-space, but invisible
between linked BPF compilation units.

> Only volatile static can theoretically be in the skeleton, but as you said

> probably no one is using them yet, so we can omit them from skeleton too.
Alexei Starovoitov April 24, 2021, 12:22 a.m. UTC | #6
On Fri, Apr 23, 2021 at 5:13 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>

> On Fri, Apr 23, 2021 at 4:48 PM Alexei Starovoitov

> <alexei.starovoitov@gmail.com> wrote:

> >

> > On Fri, Apr 23, 2021 at 4:35 PM Andrii Nakryiko

> > <andrii.nakryiko@gmail.com> wrote:

> > >

> > > On Fri, Apr 23, 2021 at 4:06 PM Alexei Starovoitov

> > > <alexei.starovoitov@gmail.com> wrote:

> > > >

> > > > On Fri, Apr 23, 2021 at 2:56 PM Yonghong Song <yhs@fb.com> wrote:

> > > > > >>>

> > > > > >>> -static volatile const __u32 print_len;

> > > > > >>> -static volatile const __u32 ret1;

> > > > > >>> +volatile const __u32 print_len = 0;

> > > > > >>> +volatile const __u32 ret1 = 0;

> > > > > >>

> > > > > >> I am little bit puzzled why bpf_iter_test_kern4.c is impacted. I think

> > > > > >> this is not in a static link test, right? The same for a few tests below.

> > > > > >

> > > > > > All the selftests are passed through a static linker, so it will

> > > > > > append obj_name to each static variable. So I just minimized use of

> > > > > > static variables to avoid too much code churn. If this variable was

> > > > > > static, it would have to be accessed as

> > > > > > skel->rodata->bpf_iter_test_kern4__print_len, for example.

> > > > >

> > > > > Okay this should be fine. selftests/bpf specific. I just feel that

> > > > > some people may get confused if they write/see a single program in

> > > > > selftest and they have to use obj_varname format and thinking this

> > > > > is a new standard, but actually it is due to static linking buried

> > > > > in Makefile. Maybe add a note in selftests/README.rst so we

> > > > > can point to people if there is confusion.

> > > >

> > > > I'm not sure I understand.

> > > > Are you saying that

> > > > bpftool gen object out_file.o in_file.o

> > > > is no longer equivalent to llvm-strip ?

> > > > Since during that step static vars will get their names mangled?

> > >

> > > Yes. Static vars and static maps. We don't allow (yet?) static

> > > entry-point BPF programs, so those don't change.

> > >

> > > > So a good chunk of code that uses skeleton right now should either

> > > > 1. don't do the linking step

> > > > or

> > > > 2. adjust their code to use global vars

> > > > or

> > > > 3. adjust the usage of skel.h in their corresponding user code

> > > >   to accommodate mangled static names?

> > > > Did it get it right?

> > >

> > > Yes, you are right. But so far most cases outside of selftest that

> > > I've seen don't use static variables (partially because they need

> > > pesky volatile to be visible from user-space at all), global vars are

> > > much nicer in that regard.

> >

> > Right.

> > but wait...

> > why linker is mangling them at all and why they appear in the skeleton?

> > static vars without volatile should not be in a skeleton, since changing

> > them from user space might have no meaning on the bpf program.

> > The behavior of the bpf prog is unpredictable.

>

> It's up to the compiler. If compiler decides that it shouldn't inline

> all the uses (or e.g. if static variable is an array accessed with

> index known only at runtime, or many other cases where compiler can't

> just deduce constant value), then compiler will emit ELF symbols, will

> allocate storage, and code will use that storage. static volatile just

> forces the compiler to not assume anything at all.

>

> If the compiler does inline all the uses of static, then we won't have

> storage allocated for it and it won't be even present in BTF. So for

> libbpf, linker and skeleton statics are no different than globals.


Not quite.
The compiler can inline static values in some cases and still
keep the var in global data in others places in the same C file.
If it's a static struct the compiler is free to do any crazy optimization
on this data it can come up with.
So the presence of it in the elf symbols doesn't guarantee anything.

> Static maps are slightly different, because we use SEC() which marks

> them as used, so they should always be present.


yes. The used attribute makes the compiler keep the data,
but it can still inline it and lose the reference in the .text.

> Sub-skeleton will present those statics to the BPF library without

> name mangling, but for the final linked BPF object file we need to

> handle statics. Definitely for maps, because static means that library

> or library user shouldn't be able to just extern that definition and

> update/lookup/corrupt its state. But I think for static variables it

> should be the same. Both are visible to user-space, but invisible

> between linked BPF compilation units.


I agree with motivation and also agree that static map def is a clean
way to hide such a map inside the library.
I'm not sure yet that static + used is enough.
More study needed.
For maps it's only the reference inside .text that is relevant.
If the compiler messes with it then we might have issues.
Like two static maps might look the same to the compiler.
They will have different field names, but their contents are zeros
in both cases. I'm not sure that the compiler will not decide to merge them
into one data location. In such case .text insns will be pointing
to the same data offset. The symbol names are probably going to be different.
So we might be ok with static+used only.
Yonghong Song April 24, 2021, 2:36 a.m. UTC | #7
On 4/23/21 5:13 PM, Andrii Nakryiko wrote:
> On Fri, Apr 23, 2021 at 4:48 PM Alexei Starovoitov

> <alexei.starovoitov@gmail.com> wrote:

>>

>> On Fri, Apr 23, 2021 at 4:35 PM Andrii Nakryiko

>> <andrii.nakryiko@gmail.com> wrote:

>>>

>>> On Fri, Apr 23, 2021 at 4:06 PM Alexei Starovoitov

>>> <alexei.starovoitov@gmail.com> wrote:

>>>>

>>>> On Fri, Apr 23, 2021 at 2:56 PM Yonghong Song <yhs@fb.com> wrote:

>>>>>>>>

>>>>>>>> -static volatile const __u32 print_len;

>>>>>>>> -static volatile const __u32 ret1;

>>>>>>>> +volatile const __u32 print_len = 0;

>>>>>>>> +volatile const __u32 ret1 = 0;

>>>>>>>

>>>>>>> I am little bit puzzled why bpf_iter_test_kern4.c is impacted. I think

>>>>>>> this is not in a static link test, right? The same for a few tests below.

>>>>>>

>>>>>> All the selftests are passed through a static linker, so it will

>>>>>> append obj_name to each static variable. So I just minimized use of

>>>>>> static variables to avoid too much code churn. If this variable was

>>>>>> static, it would have to be accessed as

>>>>>> skel->rodata->bpf_iter_test_kern4__print_len, for example.

>>>>>

>>>>> Okay this should be fine. selftests/bpf specific. I just feel that

>>>>> some people may get confused if they write/see a single program in

>>>>> selftest and they have to use obj_varname format and thinking this

>>>>> is a new standard, but actually it is due to static linking buried

>>>>> in Makefile. Maybe add a note in selftests/README.rst so we

>>>>> can point to people if there is confusion.

>>>>

>>>> I'm not sure I understand.

>>>> Are you saying that

>>>> bpftool gen object out_file.o in_file.o

>>>> is no longer equivalent to llvm-strip ?

>>>> Since during that step static vars will get their names mangled?

>>>

>>> Yes. Static vars and static maps. We don't allow (yet?) static

>>> entry-point BPF programs, so those don't change.

>>>

>>>> So a good chunk of code that uses skeleton right now should either

>>>> 1. don't do the linking step

>>>> or

>>>> 2. adjust their code to use global vars

>>>> or

>>>> 3. adjust the usage of skel.h in their corresponding user code

>>>>    to accommodate mangled static names?

>>>> Did it get it right?

>>>

>>> Yes, you are right. But so far most cases outside of selftest that

>>> I've seen don't use static variables (partially because they need

>>> pesky volatile to be visible from user-space at all), global vars are

>>> much nicer in that regard.

>>

>> Right.

>> but wait...

>> why linker is mangling them at all and why they appear in the skeleton?

>> static vars without volatile should not be in a skeleton, since changing

>> them from user space might have no meaning on the bpf program.

>> The behavior of the bpf prog is unpredictable.

> 

> It's up to the compiler. If compiler decides that it shouldn't inline

> all the uses (or e.g. if static variable is an array accessed with

> index known only at runtime, or many other cases where compiler can't

> just deduce constant value), then compiler will emit ELF symbols, will

> allocate storage, and code will use that storage. static volatile just

> forces the compiler to not assume anything at all.

> 

> If the compiler does inline all the uses of static, then we won't have

> storage allocated for it and it won't be even present in BTF. So for

> libbpf, linker and skeleton statics are no different than globals.

> 

> Static maps are slightly different, because we use SEC() which marks

> them as used, so they should always be present.

> 

> Sub-skeleton will present those statics to the BPF library without

> name mangling, but for the final linked BPF object file we need to

> handle statics. Definitely for maps, because static means that library

> or library user shouldn't be able to just extern that definition and

> update/lookup/corrupt its state. But I think for static variables it

> should be the same. Both are visible to user-space, but invisible

> between linked BPF compilation units.

> 

>> Only volatile static can theoretically be in the skeleton, but as you said

>> probably no one is using them yet, so we can omit them from skeleton too.


I think it is a good idea to keep volatile static use case in
the skeleton if we add support for static map. The volatile
static variable essentially a private map. Without this,
for skeleton users, they may need to use an explicit one-element
static array map which we probably want to avoid.
Andrii Nakryiko April 26, 2021, 3:44 p.m. UTC | #8
On Fri, Apr 23, 2021 at 5:22 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Fri, Apr 23, 2021 at 5:13 PM Andrii Nakryiko

> <andrii.nakryiko@gmail.com> wrote:

> >

> > On Fri, Apr 23, 2021 at 4:48 PM Alexei Starovoitov

> > <alexei.starovoitov@gmail.com> wrote:

> > >

> > > On Fri, Apr 23, 2021 at 4:35 PM Andrii Nakryiko

> > > <andrii.nakryiko@gmail.com> wrote:

> > > >

> > > > On Fri, Apr 23, 2021 at 4:06 PM Alexei Starovoitov

> > > > <alexei.starovoitov@gmail.com> wrote:

> > > > >

> > > > > On Fri, Apr 23, 2021 at 2:56 PM Yonghong Song <yhs@fb.com> wrote:

> > > > > > >>>

> > > > > > >>> -static volatile const __u32 print_len;

> > > > > > >>> -static volatile const __u32 ret1;

> > > > > > >>> +volatile const __u32 print_len = 0;

> > > > > > >>> +volatile const __u32 ret1 = 0;

> > > > > > >>

> > > > > > >> I am little bit puzzled why bpf_iter_test_kern4.c is impacted. I think

> > > > > > >> this is not in a static link test, right? The same for a few tests below.

> > > > > > >

> > > > > > > All the selftests are passed through a static linker, so it will

> > > > > > > append obj_name to each static variable. So I just minimized use of

> > > > > > > static variables to avoid too much code churn. If this variable was

> > > > > > > static, it would have to be accessed as

> > > > > > > skel->rodata->bpf_iter_test_kern4__print_len, for example.

> > > > > >

> > > > > > Okay this should be fine. selftests/bpf specific. I just feel that

> > > > > > some people may get confused if they write/see a single program in

> > > > > > selftest and they have to use obj_varname format and thinking this

> > > > > > is a new standard, but actually it is due to static linking buried

> > > > > > in Makefile. Maybe add a note in selftests/README.rst so we

> > > > > > can point to people if there is confusion.

> > > > >

> > > > > I'm not sure I understand.

> > > > > Are you saying that

> > > > > bpftool gen object out_file.o in_file.o

> > > > > is no longer equivalent to llvm-strip ?

> > > > > Since during that step static vars will get their names mangled?

> > > >

> > > > Yes. Static vars and static maps. We don't allow (yet?) static

> > > > entry-point BPF programs, so those don't change.

> > > >

> > > > > So a good chunk of code that uses skeleton right now should either

> > > > > 1. don't do the linking step

> > > > > or

> > > > > 2. adjust their code to use global vars

> > > > > or

> > > > > 3. adjust the usage of skel.h in their corresponding user code

> > > > >   to accommodate mangled static names?

> > > > > Did it get it right?

> > > >

> > > > Yes, you are right. But so far most cases outside of selftest that

> > > > I've seen don't use static variables (partially because they need

> > > > pesky volatile to be visible from user-space at all), global vars are

> > > > much nicer in that regard.

> > >

> > > Right.

> > > but wait...

> > > why linker is mangling them at all and why they appear in the skeleton?

> > > static vars without volatile should not be in a skeleton, since changing

> > > them from user space might have no meaning on the bpf program.

> > > The behavior of the bpf prog is unpredictable.

> >

> > It's up to the compiler. If compiler decides that it shouldn't inline

> > all the uses (or e.g. if static variable is an array accessed with

> > index known only at runtime, or many other cases where compiler can't

> > just deduce constant value), then compiler will emit ELF symbols, will

> > allocate storage, and code will use that storage. static volatile just

> > forces the compiler to not assume anything at all.

> >

> > If the compiler does inline all the uses of static, then we won't have

> > storage allocated for it and it won't be even present in BTF. So for

> > libbpf, linker and skeleton statics are no different than globals.

>

> Not quite.

> The compiler can inline static values in some cases and still

> keep the var in global data in others places in the same C file.

> If it's a static struct the compiler is free to do any crazy optimization

> on this data it can come up with.

> So the presence of it in the elf symbols doesn't guarantee anything.


Ok, yes, I can see how that could happen. But this was always the case
when someone used statics without volatile. And I've seen recent
example of using plain static (though for internal state keeping, not
used from user-space, if I remember correctly). So we could drop
non-volatile statics from skeleton, I suppose, but I'm afraid we might
be breaking existing users.

>

> > Static maps are slightly different, because we use SEC() which marks

> > them as used, so they should always be present.

>

> yes. The used attribute makes the compiler keep the data,

> but it can still inline it and lose the reference in the .text.


At least if the map is actually used with helpers (e.g.,
bpf_map_lookup_elem(&map, ...)) it would be invalid for compiler to do
anything crazy with that map reference, because compiler has no
visibility into what opaque helpers do with that memory. So I don't
think it can alias multiple maps, for instance. So I think static maps
should be fine.

>

> > Sub-skeleton will present those statics to the BPF library without

> > name mangling, but for the final linked BPF object file we need to

> > handle statics. Definitely for maps, because static means that library

> > or library user shouldn't be able to just extern that definition and

> > update/lookup/corrupt its state. But I think for static variables it

> > should be the same. Both are visible to user-space, but invisible

> > between linked BPF compilation units.

>

> I agree with motivation and also agree that static map def is a clean

> way to hide such a map inside the library.

> I'm not sure yet that static + used is enough.

> More study needed.

> For maps it's only the reference inside .text that is relevant.

> If the compiler messes with it then we might have issues.

> Like two static maps might look the same to the compiler

> They will have different field names, but their contents are zeros

> in both cases. I'm not sure that the compiler will not decide to merge them

> into one data location. In such case .text insns will be pointing

> to the same data offset. The symbol names are probably going to be different.

> So we might be ok with static+used only.


See above about passing a pointer to map into black box functions. I'd
bet that the compiler can't merge together two different references at
least because of that.

For static maps, btw, just like for static functions and vars, there
is no symbol, it's an offset into .maps section. We use that offset to
identify the map itself.
Andrii Nakryiko April 26, 2021, 3:45 p.m. UTC | #9
On Fri, Apr 23, 2021 at 7:36 PM Yonghong Song <yhs@fb.com> wrote:
>

>

>

> On 4/23/21 5:13 PM, Andrii Nakryiko wrote:

> > On Fri, Apr 23, 2021 at 4:48 PM Alexei Starovoitov

> > <alexei.starovoitov@gmail.com> wrote:

> >>

> >> On Fri, Apr 23, 2021 at 4:35 PM Andrii Nakryiko

> >> <andrii.nakryiko@gmail.com> wrote:

> >>>

> >>> On Fri, Apr 23, 2021 at 4:06 PM Alexei Starovoitov

> >>> <alexei.starovoitov@gmail.com> wrote:

> >>>>

> >>>> On Fri, Apr 23, 2021 at 2:56 PM Yonghong Song <yhs@fb.com> wrote:

> >>>>>>>>

> >>>>>>>> -static volatile const __u32 print_len;

> >>>>>>>> -static volatile const __u32 ret1;

> >>>>>>>> +volatile const __u32 print_len = 0;

> >>>>>>>> +volatile const __u32 ret1 = 0;

> >>>>>>>

> >>>>>>> I am little bit puzzled why bpf_iter_test_kern4.c is impacted. I think

> >>>>>>> this is not in a static link test, right? The same for a few tests below.

> >>>>>>

> >>>>>> All the selftests are passed through a static linker, so it will

> >>>>>> append obj_name to each static variable. So I just minimized use of

> >>>>>> static variables to avoid too much code churn. If this variable was

> >>>>>> static, it would have to be accessed as

> >>>>>> skel->rodata->bpf_iter_test_kern4__print_len, for example.

> >>>>>

> >>>>> Okay this should be fine. selftests/bpf specific. I just feel that

> >>>>> some people may get confused if they write/see a single program in

> >>>>> selftest and they have to use obj_varname format and thinking this

> >>>>> is a new standard, but actually it is due to static linking buried

> >>>>> in Makefile. Maybe add a note in selftests/README.rst so we

> >>>>> can point to people if there is confusion.

> >>>>

> >>>> I'm not sure I understand.

> >>>> Are you saying that

> >>>> bpftool gen object out_file.o in_file.o

> >>>> is no longer equivalent to llvm-strip ?

> >>>> Since during that step static vars will get their names mangled?

> >>>

> >>> Yes. Static vars and static maps. We don't allow (yet?) static

> >>> entry-point BPF programs, so those don't change.

> >>>

> >>>> So a good chunk of code that uses skeleton right now should either

> >>>> 1. don't do the linking step

> >>>> or

> >>>> 2. adjust their code to use global vars

> >>>> or

> >>>> 3. adjust the usage of skel.h in their corresponding user code

> >>>>    to accommodate mangled static names?

> >>>> Did it get it right?

> >>>

> >>> Yes, you are right. But so far most cases outside of selftest that

> >>> I've seen don't use static variables (partially because they need

> >>> pesky volatile to be visible from user-space at all), global vars are

> >>> much nicer in that regard.

> >>

> >> Right.

> >> but wait...

> >> why linker is mangling them at all and why they appear in the skeleton?

> >> static vars without volatile should not be in a skeleton, since changing

> >> them from user space might have no meaning on the bpf program.

> >> The behavior of the bpf prog is unpredictable.

> >

> > It's up to the compiler. If compiler decides that it shouldn't inline

> > all the uses (or e.g. if static variable is an array accessed with

> > index known only at runtime, or many other cases where compiler can't

> > just deduce constant value), then compiler will emit ELF symbols, will

> > allocate storage, and code will use that storage. static volatile just

> > forces the compiler to not assume anything at all.

> >

> > If the compiler does inline all the uses of static, then we won't have

> > storage allocated for it and it won't be even present in BTF. So for

> > libbpf, linker and skeleton statics are no different than globals.

> >

> > Static maps are slightly different, because we use SEC() which marks

> > them as used, so they should always be present.

> >

> > Sub-skeleton will present those statics to the BPF library without

> > name mangling, but for the final linked BPF object file we need to

> > handle statics. Definitely for maps, because static means that library

> > or library user shouldn't be able to just extern that definition and

> > update/lookup/corrupt its state. But I think for static variables it

> > should be the same. Both are visible to user-space, but invisible

> > between linked BPF compilation units.

> >

> >> Only volatile static can theoretically be in the skeleton, but as you said

> >> probably no one is using them yet, so we can omit them from skeleton too.

>

> I think it is a good idea to keep volatile static use case in

> the skeleton if we add support for static map. The volatile

> static variable essentially a private map. Without this,

> for skeleton users, they may need to use an explicit one-element

> static array map which we probably want to avoid.


I agree, `static volatile` definitely has to be supported. I wonder
what we should do about plain statics, though? What's your opinion,
Yonghong?
Yonghong Song April 26, 2021, 4:34 p.m. UTC | #10
On 4/26/21 8:45 AM, Andrii Nakryiko wrote:
> On Fri, Apr 23, 2021 at 7:36 PM Yonghong Song <yhs@fb.com> wrote:

>>

>>

>>

>> On 4/23/21 5:13 PM, Andrii Nakryiko wrote:

>>> On Fri, Apr 23, 2021 at 4:48 PM Alexei Starovoitov

>>> <alexei.starovoitov@gmail.com> wrote:

>>>>

>>>> On Fri, Apr 23, 2021 at 4:35 PM Andrii Nakryiko

>>>> <andrii.nakryiko@gmail.com> wrote:

>>>>>

>>>>> On Fri, Apr 23, 2021 at 4:06 PM Alexei Starovoitov

>>>>> <alexei.starovoitov@gmail.com> wrote:

>>>>>>

>>>>>> On Fri, Apr 23, 2021 at 2:56 PM Yonghong Song <yhs@fb.com> wrote:

>>>>>>>>>>

>>>>>>>>>> -static volatile const __u32 print_len;

>>>>>>>>>> -static volatile const __u32 ret1;

>>>>>>>>>> +volatile const __u32 print_len = 0;

>>>>>>>>>> +volatile const __u32 ret1 = 0;

>>>>>>>>>

>>>>>>>>> I am little bit puzzled why bpf_iter_test_kern4.c is impacted. I think

>>>>>>>>> this is not in a static link test, right? The same for a few tests below.

>>>>>>>>

>>>>>>>> All the selftests are passed through a static linker, so it will

>>>>>>>> append obj_name to each static variable. So I just minimized use of

>>>>>>>> static variables to avoid too much code churn. If this variable was

>>>>>>>> static, it would have to be accessed as

>>>>>>>> skel->rodata->bpf_iter_test_kern4__print_len, for example.

>>>>>>>

>>>>>>> Okay this should be fine. selftests/bpf specific. I just feel that

>>>>>>> some people may get confused if they write/see a single program in

>>>>>>> selftest and they have to use obj_varname format and thinking this

>>>>>>> is a new standard, but actually it is due to static linking buried

>>>>>>> in Makefile. Maybe add a note in selftests/README.rst so we

>>>>>>> can point to people if there is confusion.

>>>>>>

>>>>>> I'm not sure I understand.

>>>>>> Are you saying that

>>>>>> bpftool gen object out_file.o in_file.o

>>>>>> is no longer equivalent to llvm-strip ?

>>>>>> Since during that step static vars will get their names mangled?

>>>>>

>>>>> Yes. Static vars and static maps. We don't allow (yet?) static

>>>>> entry-point BPF programs, so those don't change.

>>>>>

>>>>>> So a good chunk of code that uses skeleton right now should either

>>>>>> 1. don't do the linking step

>>>>>> or

>>>>>> 2. adjust their code to use global vars

>>>>>> or

>>>>>> 3. adjust the usage of skel.h in their corresponding user code

>>>>>>     to accommodate mangled static names?

>>>>>> Did it get it right?

>>>>>

>>>>> Yes, you are right. But so far most cases outside of selftest that

>>>>> I've seen don't use static variables (partially because they need

>>>>> pesky volatile to be visible from user-space at all), global vars are

>>>>> much nicer in that regard.

>>>>

>>>> Right.

>>>> but wait...

>>>> why linker is mangling them at all and why they appear in the skeleton?

>>>> static vars without volatile should not be in a skeleton, since changing

>>>> them from user space might have no meaning on the bpf program.

>>>> The behavior of the bpf prog is unpredictable.

>>>

>>> It's up to the compiler. If compiler decides that it shouldn't inline

>>> all the uses (or e.g. if static variable is an array accessed with

>>> index known only at runtime, or many other cases where compiler can't

>>> just deduce constant value), then compiler will emit ELF symbols, will

>>> allocate storage, and code will use that storage. static volatile just

>>> forces the compiler to not assume anything at all.

>>>

>>> If the compiler does inline all the uses of static, then we won't have

>>> storage allocated for it and it won't be even present in BTF. So for

>>> libbpf, linker and skeleton statics are no different than globals.

>>>

>>> Static maps are slightly different, because we use SEC() which marks

>>> them as used, so they should always be present.

>>>

>>> Sub-skeleton will present those statics to the BPF library without

>>> name mangling, but for the final linked BPF object file we need to

>>> handle statics. Definitely for maps, because static means that library

>>> or library user shouldn't be able to just extern that definition and

>>> update/lookup/corrupt its state. But I think for static variables it

>>> should be the same. Both are visible to user-space, but invisible

>>> between linked BPF compilation units.

>>>

>>>> Only volatile static can theoretically be in the skeleton, but as you said

>>>> probably no one is using them yet, so we can omit them from skeleton too.

>>

>> I think it is a good idea to keep volatile static use case in

>> the skeleton if we add support for static map. The volatile

>> static variable essentially a private map. Without this,

>> for skeleton users, they may need to use an explicit one-element

>> static array map which we probably want to avoid.

> 

> I agree, `static volatile` definitely has to be supported. I wonder

> what we should do about plain statics, though? What's your opinion,

> Yonghong?


I think we should support plain static variables if they survive the
compilation. The 'volatile' keyword is only to prevent compiler from 
removing the static variables. For example, for the following code,

static int aa;
void set(int v) { aa = v; }
int foo() { return aa; }

the compiler won't be able to optimize "aa" away. Adding "volatile" to
the definition also works, but seems awkward. We should support this
case even if user doesn't add "volatile" in the static definition.
Alexei Starovoitov April 26, 2021, 10:34 p.m. UTC | #11
On Mon, Apr 26, 2021 at 08:44:04AM -0700, Andrii Nakryiko wrote:
> 

> >

> > > Static maps are slightly different, because we use SEC() which marks

> > > them as used, so they should always be present.

> >

> > yes. The used attribute makes the compiler keep the data,

> > but it can still inline it and lose the reference in the .text.

> 

> At least if the map is actually used with helpers (e.g.,

> bpf_map_lookup_elem(&map, ...)) it would be invalid for compiler to do

> anything crazy with that map reference, because compiler has no

> visibility into what opaque helpers do with that memory. So I don't

> think it can alias multiple maps, for instance. So I think static maps

> should be fine.


Yeah. That makes sense.

> See above about passing a pointer to map into black box functions. I'd

> bet that the compiler can't merge together two different references at

> least because of that.

> 

> For static maps, btw, just like for static functions and vars, there

> is no symbol, it's an offset into .maps section. We use that offset to

> identify the map itself.


Ok. Sounds like there is a desire to expose both static and static volatile
into skeleton.
Sure, but let's make it such the linking step doesn't change the skeleton.
Imagine a project that using single .bpf.c file and skeleton.
It grows and wants to split itself into multiple .bpf.c.
If such split would change the skeleton generated var/map names
it would be annoying user experience.

I see few options to avoid that:
- keeping the btf names as-is during linking
The final .o can have multiple vars and maps with the same name.
The skeleton gen can see the name collision and disambiguate them.
Here I think it's important to give users a choice. Blindly appending
file name is not ideal.
How to express it cleanly in .bpf.c? I don't know. SEC() would be a bit
ugly. May be similar to core flavors? ___1 and ___2 ? Also not ideal.
- another option is to fail skeleton gen if names conflict.
This way the users wold be able to link just fine and traditonal C style
linker behavior will be preserved, but if the user wants a skeleton
then the static map names across .bpf.c files shouldn't conflict.
imo that's reasonable restriction.
- maybe adopt __hidden for vars and maps? Only not hidden (which is default now)
would be seen in skeleton?
Andrii Nakryiko April 26, 2021, 11:11 p.m. UTC | #12
On Mon, Apr 26, 2021 at 3:34 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Mon, Apr 26, 2021 at 08:44:04AM -0700, Andrii Nakryiko wrote:

> >

> > >

> > > > Static maps are slightly different, because we use SEC() which marks

> > > > them as used, so they should always be present.

> > >

> > > yes. The used attribute makes the compiler keep the data,

> > > but it can still inline it and lose the reference in the .text.

> >

> > At least if the map is actually used with helpers (e.g.,

> > bpf_map_lookup_elem(&map, ...)) it would be invalid for compiler to do

> > anything crazy with that map reference, because compiler has no

> > visibility into what opaque helpers do with that memory. So I don't

> > think it can alias multiple maps, for instance. So I think static maps

> > should be fine.

>

> Yeah. That makes sense.

>

> > See above about passing a pointer to map into black box functions. I'd

> > bet that the compiler can't merge together two different references at

> > least because of that.

> >

> > For static maps, btw, just like for static functions and vars, there

> > is no symbol, it's an offset into .maps section. We use that offset to

> > identify the map itself.

>

> Ok. Sounds like there is a desire to expose both static and static volatile

> into skeleton.

> Sure, but let's make it such the linking step doesn't change the skeleton.

> Imagine a project that using single .bpf.c file and skeleton.

> It grows and wants to split itself into multiple .bpf.c.

> If such split would change the skeleton generated var/map names

> it would be annoying user experience.


It's surely not ideal, but it's a one-time step and only when user is
ready to switch to linker, so I don't see it as such a big problem.

>

> I see few options to avoid that:

> - keeping the btf names as-is during linking

> The final .o can have multiple vars and maps with the same name.

> The skeleton gen can see the name collision and disambiguate them.

> Here I think it's important to give users a choice. Blindly appending

> file name is not ideal.

> How to express it cleanly in .bpf.c? I don't know. SEC() would be a bit

> ugly. May be similar to core flavors? ___1 and ___2 ? Also not ideal.


___1 vs ___2 doesn't tell you which file you are accessing static
variable from, you need to go and figure out the order of linking. If
you look at bpf_linker__add_file() API, it has opts->object_name which
allows you to specify what should be used as <prefix>__. Sane default
seems to be the object name derived from filename, but it's possible
to override this. To allow end-users customize we can extend bpftool
to allow users to specify this. One way I was thinking would be
something like

bpftool gen object my_obj1.o=my_prefix1 my_obj2.o=my_prefix2

If user doesn't want prefixing (e.g., when linking multi-file BPF
library into a single .o) they would be able to disable this as:

bpftool gen object lib_file1.o= lib_file2.o= and so on

> - another option is to fail skeleton gen if names conflict.

> This way the users wold be able to link just fine and traditonal C style

> linker behavior will be preserved, but if the user wants a skeleton

> then the static map names across .bpf.c files shouldn't conflict.

> imo that's reasonable restriction.


There are two reasons to use static:
1. hide it from BPF code in other files (compilation units)
2. allow name conflicts (i.e., not care about anyone else accidentally
defining static variable with the same name)

I think both are important and I wouldn't want to give up #2. It
basically says: "no other file should interfere with my state neither
through naming or hijacking my state". Obviously it's impossible to
guard from user-space interference due to how BPF maps/progs are
visible to user-space, so those guarantees are mostly about BPF code
side.

Name prefixing only affects BPF skeleton generation and user-space use
of those static variables, both of which are highly-specific use
patterns "bridging two worlds", BPF and user-space. So I think it's
totally reasonable to specify that such variables will have naming
prefixes. Especially that BPF static variables inside functions
already use similar naming conventions and are similarly exposed in
BPF skeleton.

> - maybe adopt __hidden for vars and maps? Only not hidden (which is default now)

> would be seen in skeleton?


This is similar to the above, it gives up the ability to not care
about naming so much, because everything is forced to be global.
Alexei Starovoitov April 27, 2021, 2:22 a.m. UTC | #13
On Mon, Apr 26, 2021 at 04:11:23PM -0700, Andrii Nakryiko wrote:
> On Mon, Apr 26, 2021 at 3:34 PM Alexei Starovoitov

> <alexei.starovoitov@gmail.com> wrote:

> >

> > On Mon, Apr 26, 2021 at 08:44:04AM -0700, Andrii Nakryiko wrote:

> > >

> > > >

> > > > > Static maps are slightly different, because we use SEC() which marks

> > > > > them as used, so they should always be present.

> > > >

> > > > yes. The used attribute makes the compiler keep the data,

> > > > but it can still inline it and lose the reference in the .text.

> > >

> > > At least if the map is actually used with helpers (e.g.,

> > > bpf_map_lookup_elem(&map, ...)) it would be invalid for compiler to do

> > > anything crazy with that map reference, because compiler has no

> > > visibility into what opaque helpers do with that memory. So I don't

> > > think it can alias multiple maps, for instance. So I think static maps

> > > should be fine.

> >

> > Yeah. That makes sense.

> >

> > > See above about passing a pointer to map into black box functions. I'd

> > > bet that the compiler can't merge together two different references at

> > > least because of that.

> > >

> > > For static maps, btw, just like for static functions and vars, there

> > > is no symbol, it's an offset into .maps section. We use that offset to

> > > identify the map itself.

> >

> > Ok. Sounds like there is a desire to expose both static and static volatile

> > into skeleton.

> > Sure, but let's make it such the linking step doesn't change the skeleton.

> > Imagine a project that using single .bpf.c file and skeleton.

> > It grows and wants to split itself into multiple .bpf.c.

> > If such split would change the skeleton generated var/map names

> > it would be annoying user experience.

> 

> It's surely not ideal, but it's a one-time step and only when user is

> ready to switch to linker, so I don't see it as such a big problem.


even small obstacles are obstacles for adoption.

> >

> > I see few options to avoid that:

> > - keeping the btf names as-is during linking

> > The final .o can have multiple vars and maps with the same name.

> > The skeleton gen can see the name collision and disambiguate them.

> > Here I think it's important to give users a choice. Blindly appending

> > file name is not ideal.

> > How to express it cleanly in .bpf.c? I don't know. SEC() would be a bit

> > ugly. May be similar to core flavors? ___1 and ___2 ? Also not ideal.

> 

> ___1 vs ___2 doesn't tell you which file you are accessing static

> variable from, you need to go and figure out the order of linking. If

> you look at bpf_linker__add_file() API, it has opts->object_name which

> allows you to specify what should be used as <prefix>__. Sane default

> seems to be the object name derived from filename, but it's possible

> to override this. To allow end-users customize we can extend bpftool

> to allow users to specify this. One way I was thinking would be

> something like

> 

> bpftool gen object my_obj1.o=my_prefix1 my_obj2.o=my_prefix2

> 

> If user doesn't want prefixing (e.g., when linking multi-file BPF

> library into a single .o) they would be able to disable this as:

> 

> bpftool gen object lib_file1.o= lib_file2.o= and so on


ouch. I think it's quite ugly.
Equally ugly would be to ask users to rename bpf_file.o into different_file.o
just to have a different prefix.

> > - another option is to fail skeleton gen if names conflict.

> > This way the users wold be able to link just fine and traditonal C style

> > linker behavior will be preserved, but if the user wants a skeleton

> > then the static map names across .bpf.c files shouldn't conflict.

> > imo that's reasonable restriction.

> 

> There are two reasons to use static:

> 1. hide it from BPF code in other files (compilation units)

> 2. allow name conflicts (i.e., not care about anyone else accidentally

> defining static variable with the same name)

> 

> I think both are important and I wouldn't want to give up #2. It

> basically says: "no other file should interfere with my state neither

> through naming or hijacking my state". Obviously it's impossible to

> guard from user-space interference due to how BPF maps/progs are

> visible to user-space, so those guarantees are mostly about BPF code

> side.


As far as #2 I think the linker should ignore the naming conflict and
proceed with linking. It's a skeleton gen that cares about different names.
Here we're using 'static' to mean too many things.
The #1 and #2 above is traditional C style semantics which should stay as-is
for .bpf.c code that is being linked.
But we use names as points of reference in the skeleton, so user space .c
would be able to access .bpf.c.
That's the opposite of what 'static' was designed for in C.
The .bpf.c is hiding it, but skeleton makes it sort-of external and
visible to user space .c. That's not really "static" meaning.
That's why I proposed earlier to avoid adding static to skeleton.
And that's the reason we're struggling to define it cleanly.

> Name prefixing only affects BPF skeleton generation and user-space use

> of those static variables, both of which are highly-specific use

> patterns "bridging two worlds", BPF and user-space. So I think it's

> totally reasonable to specify that such variables will have naming

> prefixes. Especially that BPF static variables inside functions

> already use similar naming conventions and are similarly exposed in

> BPF skeleton.


That's clang only style of mangling static vars inside functions.
No one should count on that behavior. clang can change that at any time.
If we see somebody doing it we should discourage such use.

> 

> > - maybe adopt __hidden for vars and maps? Only not hidden (which is default now)

> > would be seen in skeleton?

> 

> This is similar to the above, it gives up the ability to not care

> about naming so much, because everything is forced to be global.


I think the best is to avoid emitting static in skeleton.
imo that's the most accurate definition of 'static' from C pov.
The linker wouldn't care about the name and would have multiple
vars in BTF datasec with the same name.
The other option is to ask users to provide the name
for such 'static' that is still 'external' from .bpf.c into .c
Either SEC() will work or we can use
static int var __attribute__((alias("external_name"))); ?
'var' would stay in BTF datasec, but "external_name" would have
to be unique in skeleton across .o-s.
Or some other way to convey in .bpf.c file that 'static' var
is not quite static but actually visible to a different .c file.
Though it's bridging different worlds.
Andrii Nakryiko April 27, 2021, 9:27 p.m. UTC | #14
On Mon, Apr 26, 2021 at 7:22 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Mon, Apr 26, 2021 at 04:11:23PM -0700, Andrii Nakryiko wrote:

> > On Mon, Apr 26, 2021 at 3:34 PM Alexei Starovoitov

> > <alexei.starovoitov@gmail.com> wrote:

> > >

> > > On Mon, Apr 26, 2021 at 08:44:04AM -0700, Andrii Nakryiko wrote:

> > > >

> > > > >

> > > > > > Static maps are slightly different, because we use SEC() which marks

> > > > > > them as used, so they should always be present.

> > > > >

> > > > > yes. The used attribute makes the compiler keep the data,

> > > > > but it can still inline it and lose the reference in the .text.

> > > >

> > > > At least if the map is actually used with helpers (e.g.,

> > > > bpf_map_lookup_elem(&map, ...)) it would be invalid for compiler to do

> > > > anything crazy with that map reference, because compiler has no

> > > > visibility into what opaque helpers do with that memory. So I don't

> > > > think it can alias multiple maps, for instance. So I think static maps

> > > > should be fine.

> > >

> > > Yeah. That makes sense.

> > >

> > > > See above about passing a pointer to map into black box functions. I'd

> > > > bet that the compiler can't merge together two different references at

> > > > least because of that.

> > > >

> > > > For static maps, btw, just like for static functions and vars, there

> > > > is no symbol, it's an offset into .maps section. We use that offset to

> > > > identify the map itself.

> > >

> > > Ok. Sounds like there is a desire to expose both static and static volatile

> > > into skeleton.

> > > Sure, but let's make it such the linking step doesn't change the skeleton.

> > > Imagine a project that using single .bpf.c file and skeleton.

> > > It grows and wants to split itself into multiple .bpf.c.

> > > If such split would change the skeleton generated var/map names

> > > it would be annoying user experience.

> >

> > It's surely not ideal, but it's a one-time step and only when user is

> > ready to switch to linker, so I don't see it as such a big problem.

>

> even small obstacles are obstacles for adoption.


The good thing is that you don't need a static linker if you have a
single-file BPF application. One can just do `strip -g` as we were
doing all along to strip away DWARF. And custom sections or attributes
seem like even a bigger obstacle to me, just to be able to use static
variables.

>

> > >

> > > I see few options to avoid that:

> > > - keeping the btf names as-is during linking

> > > The final .o can have multiple vars and maps with the same name.

> > > The skeleton gen can see the name collision and disambiguate them.

> > > Here I think it's important to give users a choice. Blindly appending

> > > file name is not ideal.

> > > How to express it cleanly in .bpf.c? I don't know. SEC() would be a bit

> > > ugly. May be similar to core flavors? ___1 and ___2 ? Also not ideal.

> >

> > ___1 vs ___2 doesn't tell you which file you are accessing static

> > variable from, you need to go and figure out the order of linking. If

> > you look at bpf_linker__add_file() API, it has opts->object_name which

> > allows you to specify what should be used as <prefix>__. Sane default

> > seems to be the object name derived from filename, but it's possible

> > to override this. To allow end-users customize we can extend bpftool

> > to allow users to specify this. One way I was thinking would be

> > something like

> >

> > bpftool gen object my_obj1.o=my_prefix1 my_obj2.o=my_prefix2

> >

> > If user doesn't want prefixing (e.g., when linking multi-file BPF

> > library into a single .o) they would be able to disable this as:

> >

> > bpftool gen object lib_file1.o= lib_file2.o= and so on

>

> ouch. I think it's quite ugly.

> Equally ugly would be to ask users to rename bpf_file.o into different_file.o

> just to have a different prefix.

>

> > > - another option is to fail skeleton gen if names conflict.

> > > This way the users wold be able to link just fine and traditonal C style

> > > linker behavior will be preserved, but if the user wants a skeleton

> > > then the static map names across .bpf.c files shouldn't conflict.

> > > imo that's reasonable restriction.

> >

> > There are two reasons to use static:

> > 1. hide it from BPF code in other files (compilation units)

> > 2. allow name conflicts (i.e., not care about anyone else accidentally

> > defining static variable with the same name)

> >

> > I think both are important and I wouldn't want to give up #2. It

> > basically says: "no other file should interfere with my state neither

> > through naming or hijacking my state". Obviously it's impossible to

> > guard from user-space interference due to how BPF maps/progs are

> > visible to user-space, so those guarantees are mostly about BPF code

> > side.

>

> As far as #2 I think the linker should ignore the naming conflict and

> proceed with linking. It's a skeleton gen that cares about different names.

> Here we're using 'static' to mean too many things.

> The #1 and #2 above is traditional C style semantics which should stay as-is

> for .bpf.c code that is being linked.

> But we use names as points of reference in the skeleton, so user space .c

> would be able to access .bpf.c.

> That's the opposite of what 'static' was designed for in C.

> The .bpf.c is hiding it, but skeleton makes it sort-of external and

> visible to user space .c. That's not really "static" meaning.

> That's why I proposed earlier to avoid adding static to skeleton.

> And that's the reason we're struggling to define it cleanly.


It's static for the purposes of BPF code, so no naming collisions in
BPF code and no intentional or unintentional accesses beyond a single
.bpf.c file. That's what we want. We don't want BPF library writers to
worry about names colliding with some other library or user code. And
from the perspective of a user of two BPF libraries that have
colliding names it's not great to have to somehow rename those
libraries' internal variables through source code changes.

>

> > Name prefixing only affects BPF skeleton generation and user-space use

> > of those static variables, both of which are highly-specific use

> > patterns "bridging two worlds", BPF and user-space. So I think it's

> > totally reasonable to specify that such variables will have naming

> > prefixes. Especially that BPF static variables inside functions

> > already use similar naming conventions and are similarly exposed in

> > BPF skeleton.

>

> That's clang only style of mangling static vars inside functions.

> No one should count on that behavior. clang can change that at any time.

> If we see somebody doing it we should discourage such use.

>

> >

> > > - maybe adopt __hidden for vars and maps? Only not hidden (which is default now)

> > > would be seen in skeleton?

> >

> > This is similar to the above, it gives up the ability to not care

> > about naming so much, because everything is forced to be global.

>

> I think the best is to avoid emitting static in skeleton.

> imo that's the most accurate definition of 'static' from C pov.

> The linker wouldn't care about the name and would have multiple

> vars in BTF datasec with the same name.

> The other option is to ask users to provide the name

> for such 'static' that is still 'external' from .bpf.c into .c

> Either SEC() will work or we can use

> static int var __attribute__((alias("external_name"))); ?

> 'var' would stay in BTF datasec, but "external_name" would have

> to be unique in skeleton across .o-s.

> Or some other way to convey in .bpf.c file that 'static' var

> is not quite static but actually visible to a different .c file.

> Though it's bridging different worlds.


Omitting static variables from skeleton is a regression and will
surprise existing users, we already went over this with you and
Yonghong in previous emails.

Beyond that, it's not clear what exactly you are proposing. For
alias() seems like another variable with that "external_name" has to
be already defined and you can't initialize var, it has to be just a
declaration. And BTF doesn't capture attributes right now as well. And
overall it sounds like an overly complicated approach both for users
and for libbpf.

As for the extra SEC() annotation. It's both not supported by libbpf
right now, and it's not exactly clear how it helps with name conflicts
(see example above with two libraries colliding). In that regard a
prefix and ability to override it by user gives them an opportunity to
resolve such naming conflicts. I know it's kind of ugly, but name
overrides should hopefully be rarely needed.

I don't think we'll find an ideal solution that will satisfy everyone.
What can I do to unblock BPF static linker work, though?
Alexei Starovoitov April 28, 2021, 4:55 a.m. UTC | #15
On Tue, Apr 27, 2021 at 02:27:26PM -0700, Andrii Nakryiko wrote:
> 

> It's static for the purposes of BPF code, so no naming collisions in

> BPF code and no intentional or unintentional accesses beyond a single

> .bpf.c file. That's what we want. We don't want BPF library writers to

> worry about names colliding with some other library or user code. 


Who is 'we' ?
The skel gen by accident (probably) extended 'static' visibility
from .bpf.c into user space .c.
My point that it was a bad accident and I only realized it now.
The naming conflict in linking exposed this problem.
The selftests are relying on this 'feature' now.
Potentially some user could have used it as well.
Do I want to break that use case? Not really, but that's still an option.

I agree that library writers shouldn't worry about
naming conflicts in *.bpf.c space.
If they're exporting a static from .bpf.c into .c I think it's ok
to make them do extra steps.
Such 'static in .bpf.c' but 'extern into .c' should somehow work
without requiring people rename their vars.
I mean that the user of the library shouldn't need to do renames,
but the library author shouldn't rely on 'all statics are visible
in skel'.
In that sense what I proposed earlier to allow linking, but fail
skel gen is a step towards such development process.
Something like attr(alias()) or some other way hopefully can
help library authors create such library where static+something
is visible to library's skeleton, but users of the library
don't see its statics.
I think the static handling logic needs to be discussed
with your sub-skeleton idea.
If I got it correctly there will be something like this:
- lib.bpf.c compiled into lib.bpf.o
- main.bpf.c that links with lib.bpf.o
  It's all in *.bpf.c space and static has normal C scope.
  The static vars and maps in lib.bpf.c are not
  visible in main.bpf.c
- there is lib.c that works with lib.bpf.c via lib.skel.h
  that was generated out of lib.bpf.o
- main.c that links with lib.o
  main.c works with main.bpf.c via main.skel.h

I think lib.skel.h you were calling sub-skeleton.
pls correct me.
Since main.bpf.o was linked with lib.bpf.o
the main.skel.h will include the things from it.
But main.c shouldn't be accessing them, since that's the
point of the library.

At the same time lib.bpf.c and main.bpf.c could have been
just two files of the same project. If lib.bpf.o isn't a library
then main.skel.h should access it just fine.
So what is bpf library? How should it be defined?
And what is the scope and visibility of its vars/maps/funcs?
Unlike traditional C the bpf has two worlds .bpf.c and .c
So traditional 'static' doesn't cover these cases.

> And

> from the perspective of a user of two BPF libraries that have

> colliding names it's not great to have to somehow rename those

> libraries' internal variables through source code changes.


It's not great. That's why I'm trying to provoke a discussion
of more options and pick the best considering all + and -.

> 

> Omitting static variables from skeleton is a regression and will

> surprise existing users, we already went over this with you and

> Yonghong in previous emails.


Do I want to suffer this regression? No, but it could be the only option.

> Beyond that, it's not clear what exactly you are proposing. 


To discuss all options as a whole and hopefully you and others
can come up with more than what I proposed.

> For

> alias() seems like another variable with that "external_name" has to

> be already defined and you can't initialize var, it has to be just a

> declaration. And BTF doesn't capture attributes right now as well. And

> overall it sounds like an overly complicated approach both for users

> and for libbpf.


yes. supporting alias() would mean more work in clang, libbpf and
maybe new bits in BTF.

> As for the extra SEC() annotation. It's both not supported by libbpf

> right now, and it's not exactly clear how it helps with name conflicts

> (see example above with two libraries colliding). In that regard a

> prefix and ability to override it by user gives them an opportunity to

> resolve such naming conflicts. I know it's kind of ugly, but name

> overrides should hopefully be rarely needed.


Yes. it's not supported today. All I'm saying it's one of the options.

> What can I do to unblock BPF static linker work, though?


I believe that the way bpf toolchain interprets static is
a critical long term decision that shouldn't be done lightly.
There is no rush to define it quickly as an automatic prefix of filename
to all statics only because it's trivial to implement and sort-of works.
It's not something we can undo later. Today there are no libraries
and static definition of maps doesn't really work.
So the only regression (if we decide to change) would be the way skeleton
emits statics.

> I don't think we'll find an ideal solution that will satisfy everyone.


I think we didn't even start looking for that solution.
At least I'm only starting to grasp the complexity of the problem.
Andrii Nakryiko April 28, 2021, 7:33 p.m. UTC | #16
On Tue, Apr 27, 2021 at 9:55 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Tue, Apr 27, 2021 at 02:27:26PM -0700, Andrii Nakryiko wrote:

> >

> > It's static for the purposes of BPF code, so no naming collisions in

> > BPF code and no intentional or unintentional accesses beyond a single

> > .bpf.c file. That's what we want. We don't want BPF library writers to

> > worry about names colliding with some other library or user code.

>

> Who is 'we' ?


Me and Yonghong, at least. Putting myself into BPF library
writer/maintainer shoes (which I have plans for), I'd say any BPF
library writer as well.

> The skel gen by accident (probably) extended 'static' visibility

> from .bpf.c into user space .c.

> My point that it was a bad accident and I only realized it now.

> The naming conflict in linking exposed this problem.


There is no linking naming conflict. We are talking about BPF skeleton
(and not just trough BPF skeleton, potentially) access to
global/static variables (and static maps as well).

> The selftests are relying on this 'feature' now.

> Potentially some user could have used it as well.

> Do I want to break that use case? Not really, but that's still an option.

>

> I agree that library writers shouldn't worry about

> naming conflicts in *.bpf.c space.

> If they're exporting a static from .bpf.c into .c I think it's ok

> to make them do extra steps.

> Such 'static in .bpf.c' but 'extern into .c' should somehow work

> without requiring people rename their vars.

> I mean that the user of the library shouldn't need to do renames,

> but the library author shouldn't rely on 'all statics are visible

> in skel'.

> In that sense what I proposed earlier to allow linking, but fail

> skel gen is a step towards such development process.


This is not just variables in BPF skeleton. Static maps, even without
skeleton, will have the same problem. All maps are accessible through
bpf_object__find_map_by_name().

> Something like attr(alias()) or some other way hopefully can

> help library authors create such library where static+something

> is visible to library's skeleton, but users of the library

> don't see its statics.

> I think the static handling logic needs to be discussed

> with your sub-skeleton idea.

> If I got it correctly there will be something like this:

> - lib.bpf.c compiled into lib.bpf.o


small adjustment here, it might be lib1.bpf.c + lib2.bpf.c + and so on
-> (through intermediate .bpf.o) -> final lib.bpf.o

> - main.bpf.c that links with lib.bpf.o

>   It's all in *.bpf.c space and static has normal C scope.

>   The static vars and maps in lib.bpf.c are not

>   visible in main.bpf.c

> - there is lib.c that works with lib.bpf.c via lib.skel.h

>   that was generated out of lib.bpf.o

> - main.c that links with lib.o

>   main.c works with main.bpf.c via main.skel.h

>

> I think lib.skel.h you were calling sub-skeleton.

> pls correct me.


no, all correct

> Since main.bpf.o was linked with lib.bpf.o

> the main.skel.h will include the things from it.

> But main.c shouldn't be accessing them, since that's the

> point of the library.

>

> At the same time lib.bpf.c and main.bpf.c could have been

> just two files of the same project. If lib.bpf.o isn't a library

> then main.skel.h should access it just fine.

> So what is bpf library? How should it be defined?


In very abstract terms, it's a piece of BPF application (maps,
programs, variables) and corresponding user-space (initialization,
teardown, runtime operations) that is not a complete application and
is supposed to be linked into (potentially) someone else's
application.

You are right that right now main.skel.h will have visibility into
library's maps, variables, programs. Not great, but didn't feel that
horrible either. After all, user of BPF library has to follow some API
expectations regardless, like calling some init and teardown APIs and
otherwise setting up BPF library at runtime.

I'm not sure what alternative there is. User-space C ecosystem doesn't
differentiate between linking with some my_prog.o vs libbpf.a. So
static libraries are not special in any way and if they have
conflicting global variable names, it will cause linker error. This is
where static variables are important.

Shared libraries are a bit more formally recognized, but that's a very
different thing (that's where STV_HIDDEN plays the role in user-space
linking).

So I feel like you are getting at something, but don't quite spell it
out. Please elaborate.

> And what is the scope and visibility of its vars/maps/funcs?

> Unlike traditional C the bpf has two worlds .bpf.c and .c

> So traditional 'static' doesn't cover these cases.


Right, static is static only within the BPF world.

>

> > And

> > from the perspective of a user of two BPF libraries that have

> > colliding names it's not great to have to somehow rename those

> > libraries' internal variables through source code changes.

>

> It's not great. That's why I'm trying to provoke a discussion

> of more options and pick the best considering all + and -.

>


We are having this discussion, even if it produces disappointing
results. I don't know better options short of disabling static
variables. I've thought a lot about it.

What's worse, to bpftool, generating BPF skeleton for .bpf.o straight
from Clang or to statically-linked .bpf.o is indistinguishable, so
it's not even simple to just say "let's not generate static variables
into BPF skeleton". But there are also static maps with similar issues
and non-skeleton-based APIs to access them by name
(bpf_object__find_map_by_name()). There we definitely can't just keep
saying that static maps are not supported. I think we both agreed
static maps would be good to have, but no one will die if we drop
them.

> >

> > Omitting static variables from skeleton is a regression and will

> > surprise existing users, we already went over this with you and

> > Yonghong in previous emails.

>

> Do I want to suffer this regression? No, but it could be the only option.

>

> > Beyond that, it's not clear what exactly you are proposing.

>

> To discuss all options as a whole and hopefully you and others

> can come up with more than what I proposed.

>

> > For

> > alias() seems like another variable with that "external_name" has to

> > be already defined and you can't initialize var, it has to be just a

> > declaration. And BTF doesn't capture attributes right now as well. And

> > overall it sounds like an overly complicated approach both for users

> > and for libbpf.

>

> yes. supporting alias() would mean more work in clang, libbpf and

> maybe new bits in BTF.

>

> > As for the extra SEC() annotation. It's both not supported by libbpf

> > right now, and it's not exactly clear how it helps with name conflicts

> > (see example above with two libraries colliding). In that regard a

> > prefix and ability to override it by user gives them an opportunity to

> > resolve such naming conflicts. I know it's kind of ugly, but name

> > overrides should hopefully be rarely needed.

>

> Yes. it's not supported today. All I'm saying it's one of the options.

>

> > What can I do to unblock BPF static linker work, though?

>

> I believe that the way bpf toolchain interprets static is

> a critical long term decision that shouldn't be done lightly.

> There is no rush to define it quickly as an automatic prefix of filename

> to all statics only because it's trivial to implement and sort-of works.

> It's not something we can undo later. Today there are no libraries

> and static definition of maps doesn't really work.

> So the only regression (if we decide to change) would be the way skeleton

> emits statics.

>

> > I don't think we'll find an ideal solution that will satisfy everyone.

>

> I think we didn't even start looking for that solution.

> At least I'm only starting to grasp the complexity of the problem.


I did and didn't find anything satisfactory. But I think we are coming
at this from two different angles, which is why we can't agree on
anything. So just a reminder, static is about two properties:
    1) access protection
    2) naming collisions.

I'm trying to let name collisions on BPF side happen and be allowed
*while* also allowing access to those same name-collisioned entities
(maps and vars, both) from user-space in some non-random fashion. That
inevitably requires some compromises/conventions on the user-space
side. Such an approach preserves both 1) and 2).

You are trying to enforce unique names (or at least aliases) for
static variables, if I understand correctly, which preserves 1) at the
expense of 2). It seems to be a similar idea with custom SEC(), though
you ignored my request to elaborate on how you see that used, so I'm
guessing here a bit.

But I think we can get just 1) with global variables with custom
visibilities. E.g., just marking map/variable as __hidden would
disallow extern'ing it from other files. That's obviously limiting for
extern'ing within the library, so we can keep digging deeper and
define __internal (STV_INTERNAL) that would be "upgraded" to
STV_HIDDEN after the initial linking pass. So you'd compile your BPF
library with __internal, but your lib.bpf.o will have those global
variables as STV_HIDDEN and thus inaccessible from other libraries and
BPF app itself.

So if we are ok breaking existing static variable users, then just
dropping statics from BPF skeleton and supporting extra __hidden and
__internal semantics for variables and maps would bypass these issues.
I wanted statics mostly for property 2), but if I can't get it, then
I'd drop statics from skeletons altogether.

If I could drop statics for skeletons that were statically linked,
that wouldn't be a regression. It's impossible to do right now, but we
can also add a new SHT_NOTE section, which we can use to detect
statically linked vs Clang-generated .bpf.o. Certainly more ELF
fussing around than I'd like, but not the end of the world either.

Thoughts? Did that summarize the issue well enough?
Alexei Starovoitov May 5, 2021, 5:22 a.m. UTC | #17
On Mon, May 3, 2021 at 9:42 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Wed, Apr 28, 2021 at 12:33:36PM -0700, Andrii Nakryiko wrote:

> > > At least I'm only starting to grasp the complexity of the problem.

> >

> > I did and didn't find anything satisfactory. But I think we are coming

> > at this from two different angles, which is why we can't agree on

> > anything. So just a reminder, static is about two properties:

> >     1) access protection

> >     2) naming collisions.

> >

> > I'm trying to let name collisions on BPF side happen and be allowed

> > *while* also allowing access to those same name-collisioned entities

> > (maps and vars, both) from user-space in some non-random fashion. That

> > inevitably requires some compromises/conventions on the user-space

> > side. Such an approach preserves both 1) and 2).

> >

> > You are trying to enforce unique names (or at least aliases) for

> > static variables, if I understand correctly, which preserves 1) at the

> > expense of 2). It seems to be a similar idea with custom SEC(), though

> > you ignored my request to elaborate on how you see that used, so I'm

> > guessing here a bit.

> >

> > But I think we can get just 1) with global variables with custom

> > visibilities. E.g., just marking map/variable as __hidden would

> > disallow extern'ing it from other files. That's obviously limiting for

> > extern'ing within the library, so we can keep digging deeper and

> > define __internal (STV_INTERNAL) that would be "upgraded" to

> > STV_HIDDEN after the initial linking pass. So you'd compile your BPF

> > library with __internal, but your lib.bpf.o will have those global

> > variables as STV_HIDDEN and thus inaccessible from other libraries and

> > BPF app itself.

> >

> > So if we are ok breaking existing static variable users, then just

> > dropping statics from BPF skeleton and supporting extra __hidden and

> > __internal semantics for variables and maps would bypass these issues.

> > I wanted statics mostly for property 2), but if I can't get it, then

> > I'd drop statics from skeletons altogether.

> >

> > If I could drop statics for skeletons that were statically linked,

> > that wouldn't be a regression. It's impossible to do right now, but we

> > can also add a new SHT_NOTE section, which we can use to detect

> > statically linked vs Clang-generated .bpf.o. Certainly more ELF

> > fussing around than I'd like, but not the end of the world either.

> >

> > Thoughts? Did that summarize the issue well enough?

>

> Background for all:

>

> Until Nov 2019 libbpf didn't support global variables, so bpf programs

> contained code like 'static volatile const int var = 1;'

> Then the skeleton was introduced which went through BTF of a given

> datasec and emitted all variables from that section into .skel.h.

> It didn't bother filtering static vs global variables, so

> static vars in *.bpf.c world became visible into user space *.c world.

> While libbpf supported single bpf.o file such extern-ing of statics

> was fine, but with support of linking multiple *.bpf.o there

> is a question of what to do with static variables with the same names

> in different files.

>

> Consider the following scenario:

> One bpf developer creates a library conntrack. It has

> impl.bpf.c

> ct_api.bpf.c

> and corresponding user space ct.c that uses skel.h to access

> data in these two bpf files.

>

> Another bpf developer creates a library for lru. It has

> impl.bpf.c

> lru_api.bpf.c

> and corresponding user space lru.c.

>

> Now the 3rd developer is writing its main.bpf.c and wants to use these libs.

>

> The libs should be usable in pre-compiled form. The availability of

> the source code is nice, but it shouldn't be mandatory.

>

> So there is libct.a (with user space) and libct.bpf.a (with bpf code)

> and liblru.a (user) and liblru.bpf.a (bpf code).

>

> The developer should be able to link

> main.bpf.o liblru.bpf.a libct.bpf.a

> into final_main.bpf.o

> And link main.o liblru.a libct.a with user space bits into a.out.

>

> The lru.skel.h and ct.skel.h used by these libs were generated

> out of corresponding *.bpf.o and independent of each other.

> There should be no need to recompile user space lru.c and ct.c after

> linking of final_main.bpf.o and generating final skeleton.

>

> I think all three developers should be able to use static variables

> in their .bpf.c files without worrying about conflicts across three

> projects.

> They can use global vars with __attribute__("hidden"),

> but it's not equivalent to static. The linker will complain of

> redefinition if the same name is used across multiple files

> or multiple libs.

> So doing 'int var __attribute__("hidden");' in libct.bpf.a and

> in liblru.bpf.a will prevent linking together.

> That's traditional static linking semantics.

>

> Using file name as a prefix for static vars doesn't work in general,

> since file names can be the same.

> What can work is the library name. The library name is guaranteed to be

> unique in the final linking phase.

> I think we can use it to namespace static variables across

> three sets of bpf programs.

> Also I think it's ok to require a single developer to enforce

> uniqueness of static vars within a project.

>

> In other words 'static int a;' in impl.bpf.c will conflict

> with 'static int a;' in ct_api.bpf.c

> But the static variable in ct_api.bpf.c will not conflict

> with the same variable in lru_api.bpf.c and will not conflict

> with such var in main.bpf.c because they're in a different namespaces.

>

> Here are few ways for the programmer to indicate the library namespaces:

>

> - similar to 'char license[]' use 'char library[]="lru";' in *.bpf.c

> The static linker will handle this reserved name specially just like

> it does 'license' and 'version'.

>

> - #pragma clang attribute push (__attribute__((annotate("lib=lru"))), apply_to = variable)

>

> - #pragma comment(lib, "lru")

>

> I think it's important to define namespaces within *.bpf.c.

> Defining them outside on linker command line or linker script is cumbersome.

>

> I think combining *.o into .a can happen with traditional 'ar'. No need for

> extra checks for now.

> The linking of main.bpf.o liblru.bpf.a libct.bpf.a

> will fail if static vars with the same name are present within the same library.

> The library namespaces will prevent name conflicts across libs and main.bpf.o

> If namespace is not specified it means it's empty, so the existing

> hacks of 'static volatile const int var;' will continue working.

>

> The skeleton can have library name as anon struct in skel.h.

> All vars can be prefixed too, but scoping them into single struct is cleaner.

>

> I think it doesn't hurt if final_main.skel.h includes all bpf vars from lru and

> ct libraries, but I think it's cleaner to omit them.

>

> It's not clear to me yet how final_main__open() and final_main__load() skeleton

> methods will work since lru and ct libs might need their specific initialization

> that is done by user space lru.c and ct.c.

> Also the whole scheme should work with upcoming light skeleton too.

> The design for bpf libraries should accommodate signed libraries.

>

> All of the above is up for discussion. I'd love to hear what golang folks

> are thinking, since above proposal is C centric.


I want to clarify a few things that were brought up in offline discussions.
There are several options:
1. don't emit statics at all.
That will break some skeleton users and doesn't solve the name conflict issue.
The library authors would need to be careful and use a unique enough
prefix for all global vars (including attribute("hidden") ones).
That's no different with traditional static linking in C.
bpf static linker already rejects linking if file1.bpf.c is trying to
'extern int foo()'
when it was '__hidden int foo();' in file2.bpf.c
That's safer than traditional linker and the same approach can be
applied to vars.
So externing of __hidden vars won't be possible, but they will name conflict.

2. emit statics when they don't conflict and fail skel gen where there
is a naming conflict.
That helps a bit, but library authors still have to be careful with
both static and global names.
Which is more annoying than traditional C.

3. do #2 style of failing skel gen if there is a naming conflict, but
also introduce namespacing concept, so that both global and static
vars can be automatically namespaced.
That's the proposal above.
This way, I'm guessing, some libraries will use namespaces to avoid
prefixing everything.
The folks that hate namespaces and #pragmas will do manual prefixes for
both static and global vars.

For approaches
char library[]="lru";'
and
#pragma comment(lib, "lru")
the scope of namespace is the whole .bpf.c file.
The clang/llvm already support it, so the job of name mangling would
belong to linker.

For __attribute__((annotate("lib=lru"))) the scope could be any number
of lines in C files between pragma push/pop and can be nested.
This attribute is supported by clang, but not in the bpf backend.
The llvm would prefix both global and static names
in elf file and in btf.
If another file.bpf.c needs to call a function from namespace "lru"
it would need to prefix such a call.
The skel gen job would be #2 above (emit both static and globals if
they don't conflict).
Such namespacing concept would be the closest to c++ namespaces.

If I understood what folks were saying no one is excited about namespaces in C.
So probably #3 is out and sounds like 1 is prefered?
So don't emit statics ?

Daniel, Lorenz, John,

what's your take ?
Daniel Borkmann May 6, 2021, 10:54 p.m. UTC | #18
On 5/5/21 7:22 AM, Alexei Starovoitov wrote:
> On Mon, May 3, 2021 at 9:42 PM Alexei Starovoitov

> <alexei.starovoitov@gmail.com> wrote:

>> On Wed, Apr 28, 2021 at 12:33:36PM -0700, Andrii Nakryiko wrote:

>>>> At least I'm only starting to grasp the complexity of the problem.

>>>

>>> I did and didn't find anything satisfactory. But I think we are coming

>>> at this from two different angles, which is why we can't agree on

>>> anything. So just a reminder, static is about two properties:

>>>      1) access protection

>>>      2) naming collisions.

>>>

>>> I'm trying to let name collisions on BPF side happen and be allowed

>>> *while* also allowing access to those same name-collisioned entities

>>> (maps and vars, both) from user-space in some non-random fashion. That

>>> inevitably requires some compromises/conventions on the user-space

>>> side. Such an approach preserves both 1) and 2).

>>>

>>> You are trying to enforce unique names (or at least aliases) for

>>> static variables, if I understand correctly, which preserves 1) at the

>>> expense of 2). It seems to be a similar idea with custom SEC(), though

>>> you ignored my request to elaborate on how you see that used, so I'm

>>> guessing here a bit.

>>>

>>> But I think we can get just 1) with global variables with custom

>>> visibilities. E.g., just marking map/variable as __hidden would

>>> disallow extern'ing it from other files. That's obviously limiting for

>>> extern'ing within the library, so we can keep digging deeper and

>>> define __internal (STV_INTERNAL) that would be "upgraded" to

>>> STV_HIDDEN after the initial linking pass. So you'd compile your BPF

>>> library with __internal, but your lib.bpf.o will have those global

>>> variables as STV_HIDDEN and thus inaccessible from other libraries and

>>> BPF app itself.

>>>

>>> So if we are ok breaking existing static variable users, then just

>>> dropping statics from BPF skeleton and supporting extra __hidden and

>>> __internal semantics for variables and maps would bypass these issues.

>>> I wanted statics mostly for property 2), but if I can't get it, then

>>> I'd drop statics from skeletons altogether.

>>>

>>> If I could drop statics for skeletons that were statically linked,

>>> that wouldn't be a regression. It's impossible to do right now, but we

>>> can also add a new SHT_NOTE section, which we can use to detect

>>> statically linked vs Clang-generated .bpf.o. Certainly more ELF

>>> fussing around than I'd like, but not the end of the world either.

>>>

>>> Thoughts? Did that summarize the issue well enough?

>>

>> Background for all:

>>

>> Until Nov 2019 libbpf didn't support global variables, so bpf programs

>> contained code like 'static volatile const int var = 1;'

>> Then the skeleton was introduced which went through BTF of a given

>> datasec and emitted all variables from that section into .skel.h.

>> It didn't bother filtering static vs global variables, so

>> static vars in *.bpf.c world became visible into user space *.c world.

>> While libbpf supported single bpf.o file such extern-ing of statics

>> was fine, but with support of linking multiple *.bpf.o there

>> is a question of what to do with static variables with the same names

>> in different files.

>>

>> Consider the following scenario:

>> One bpf developer creates a library conntrack. It has

>> impl.bpf.c

>> ct_api.bpf.c

>> and corresponding user space ct.c that uses skel.h to access

>> data in these two bpf files.

>>

>> Another bpf developer creates a library for lru. It has

>> impl.bpf.c

>> lru_api.bpf.c

>> and corresponding user space lru.c.

>>

>> Now the 3rd developer is writing its main.bpf.c and wants to use these libs.

>>

>> The libs should be usable in pre-compiled form. The availability of

>> the source code is nice, but it shouldn't be mandatory.

>>

>> So there is libct.a (with user space) and libct.bpf.a (with bpf code)

>> and liblru.a (user) and liblru.bpf.a (bpf code).

>>

>> The developer should be able to link

>> main.bpf.o liblru.bpf.a libct.bpf.a

>> into final_main.bpf.o

>> And link main.o liblru.a libct.a with user space bits into a.out.

>>

>> The lru.skel.h and ct.skel.h used by these libs were generated

>> out of corresponding *.bpf.o and independent of each other.

>> There should be no need to recompile user space lru.c and ct.c after

>> linking of final_main.bpf.o and generating final skeleton.

>>

>> I think all three developers should be able to use static variables

>> in their .bpf.c files without worrying about conflicts across three

>> projects.

>> They can use global vars with __attribute__("hidden"),

>> but it's not equivalent to static. The linker will complain of

>> redefinition if the same name is used across multiple files

>> or multiple libs.

>> So doing 'int var __attribute__("hidden");' in libct.bpf.a and

>> in liblru.bpf.a will prevent linking together.

>> That's traditional static linking semantics.

>>

>> Using file name as a prefix for static vars doesn't work in general,

>> since file names can be the same.

>> What can work is the library name. The library name is guaranteed to be

>> unique in the final linking phase.

>> I think we can use it to namespace static variables across

>> three sets of bpf programs.

>> Also I think it's ok to require a single developer to enforce

>> uniqueness of static vars within a project.

>>

>> In other words 'static int a;' in impl.bpf.c will conflict

>> with 'static int a;' in ct_api.bpf.c

>> But the static variable in ct_api.bpf.c will not conflict

>> with the same variable in lru_api.bpf.c and will not conflict

>> with such var in main.bpf.c because they're in a different namespaces.

>>

>> Here are few ways for the programmer to indicate the library namespaces:

>>

>> - similar to 'char license[]' use 'char library[]="lru";' in *.bpf.c

>> The static linker will handle this reserved name specially just like

>> it does 'license' and 'version'.

>>

>> - #pragma clang attribute push (__attribute__((annotate("lib=lru"))), apply_to = variable)

>>

>> - #pragma comment(lib, "lru")

>>

>> I think it's important to define namespaces within *.bpf.c.

>> Defining them outside on linker command line or linker script is cumbersome.

>>

>> I think combining *.o into .a can happen with traditional 'ar'. No need for

>> extra checks for now.

>> The linking of main.bpf.o liblru.bpf.a libct.bpf.a

>> will fail if static vars with the same name are present within the same library.

>> The library namespaces will prevent name conflicts across libs and main.bpf.o

>> If namespace is not specified it means it's empty, so the existing

>> hacks of 'static volatile const int var;' will continue working.

>>

>> The skeleton can have library name as anon struct in skel.h.

>> All vars can be prefixed too, but scoping them into single struct is cleaner.

>>

>> I think it doesn't hurt if final_main.skel.h includes all bpf vars from lru and

>> ct libraries, but I think it's cleaner to omit them.

>>

>> It's not clear to me yet how final_main__open() and final_main__load() skeleton

>> methods will work since lru and ct libs might need their specific initialization

>> that is done by user space lru.c and ct.c.

>> Also the whole scheme should work with upcoming light skeleton too.

>> The design for bpf libraries should accommodate signed libraries.

>>

>> All of the above is up for discussion. I'd love to hear what golang folks

>> are thinking, since above proposal is C centric.

> 

> I want to clarify a few things that were brought up in offline discussions.

> There are several options:

> 1. don't emit statics at all.

> That will break some skeleton users and doesn't solve the name conflict issue.

> The library authors would need to be careful and use a unique enough

> prefix for all global vars (including attribute("hidden") ones).

> That's no different with traditional static linking in C.

> bpf static linker already rejects linking if file1.bpf.c is trying to

> 'extern int foo()'

> when it was '__hidden int foo();' in file2.bpf.c

> That's safer than traditional linker and the same approach can be

> applied to vars.

> So externing of __hidden vars won't be possible, but they will name conflict.

> 

> 2. emit statics when they don't conflict and fail skel gen where there

> is a naming conflict.

> That helps a bit, but library authors still have to be careful with

> both static and global names.

> Which is more annoying than traditional C.

> 

> 3. do #2 style of failing skel gen if there is a naming conflict, but

> also introduce namespacing concept, so that both global and static

> vars can be automatically namespaced.

> That's the proposal above.

> This way, I'm guessing, some libraries will use namespaces to avoid

> prefixing everything.

> The folks that hate namespaces and #pragmas will do manual prefixes for

> both static and global vars.

> 

> For approaches

> char library[]="lru";'

> and

> #pragma comment(lib, "lru")

> the scope of namespace is the whole .bpf.c file.

> The clang/llvm already support it, so the job of name mangling would

> belong to linker.

> 

> For __attribute__((annotate("lib=lru"))) the scope could be any number

> of lines in C files between pragma push/pop and can be nested.

> This attribute is supported by clang, but not in the bpf backend.

> The llvm would prefix both global and static names

> in elf file and in btf.

> If another file.bpf.c needs to call a function from namespace "lru"

> it would need to prefix such a call.

> The skel gen job would be #2 above (emit both static and globals if

> they don't conflict).

> Such namespacing concept would be the closest to c++ namespaces.

> 

> If I understood what folks were saying no one is excited about namespaces in C.

> So probably #3 is out and sounds like 1 is prefered?

> So don't emit statics ?

> 

> Daniel, Lorenz, John,

> 

> what's your take ?


Hmm, if it wasn't for the breakage, I would be in strong favour of 1), mainly
because it's the most _natural/closest_ to C, so developers wouldn't have to
worry about statics. Less hidden magic.

Even with 3) it's an unexpected extra step that developers have to be aware of
for the case of statics at least. Presumably if it's for the whole .bpf.c file
something like __library(lru) whether pragma or char might be okay given devs
already have extra annotations like license.

The __attribute__ sounds also okay but needs explanation to users overall. If
we need to go that road, we probably need to have both, meaning, pragma or char
/and/ the attribute one. I'm thinking the former mainly so that users don't have
to worry about stumbling into the statics conflicts later on.

Thanks,
Daniel
Lorenz Bauer May 11, 2021, 2:20 p.m. UTC | #19
On Wed, 5 May 2021 at 06:22, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> > All of the above is up for discussion. I'd love to hear what golang folks

> > are thinking, since above proposal is C centric.


Sorry for the late reply, I was on holiday.

Regarding your conntrack library example:
- what is the difference between impl.bpf.c and ct_api.bpf.c? If I
understand correctly, ct_api is used to generate the skel.h, but impl
isn't?
- what file would main.bpf.c include? ct_api or skel.h?

Regarding Andrii's proposal in the forwarded email to use __hidden,
__internal etc. Are these correct:
- static int foo: this is only available in the same .o, not
accessible from user space. Can be referenced via extern int foo?
- __hidden int foo: only available in same .o, not accessible from user space
- __internal int foo: only available in same .a via extern, not
accessible from user space
- int foo: available / conflicts in all .o, accessible from user space
(aka included in skel.h)

When you speak of the linker, do you mean libbpf or the clang / llvm
linker? The Go toolchain has a simplistic linker to support bpf2bpf
calls from the same .o so I imagine libbpf has something similar.

> I want to clarify a few things that were brought up in offline discussions.

> There are several options:

> 1. don't emit statics at all.

> That will break some skeleton users and doesn't solve the name conflict issue.

> The library authors would need to be careful and use a unique enough

> prefix for all global vars (including attribute("hidden") ones).

> That's no different with traditional static linking in C.

> bpf static linker already rejects linking if file1.bpf.c is trying to

> 'extern int foo()'

> when it was '__hidden int foo();' in file2.bpf.c

> That's safer than traditional linker and the same approach can be

> applied to vars.

> So externing of __hidden vars won't be possible, but they will name conflict.

>

> 2. emit statics when they don't conflict and fail skel gen where there

> is a naming conflict.

> That helps a bit, but library authors still have to be careful with

> both static and global names.

> Which is more annoying than traditional C.


The only way I see this affecting the Go toolchain is if main.bpf.c
includes skel.h, not some other .c (or .h?) Otherwise I would work
hard to keep libraries / programs in their own namespace. The Go
toolchain might end up doing the final link of main.bpf.o and
libct.bpf.a (assuming the .a is linked by llvm or libbpf).

In general I'm with Daniel here that I prefer traditional C static
semantics aka option #1.

>

> 3. do #2 style of failing skel gen if there is a naming conflict, but

> also introduce namespacing concept, so that both global and static

> vars can be automatically namespaced.

> That's the proposal above.

> This way, I'm guessing, some libraries will use namespaces to avoid

> prefixing everything.

> The folks that hate namespaces and #pragmas will do manual prefixes for

> both static and global vars.

>

> For approaches

> char library[]="lru";'

> and

> #pragma comment(lib, "lru")

> the scope of namespace is the whole .bpf.c file.

> The clang/llvm already support it, so the job of name mangling would

> belong to linker.


I think this would work well for Go, because it makes the namespace
explicit. I can imagine that #pragma comment(lib,
"github.com/some/go/package") might be useful. How is the pragma
encoded into the ELF? Would this solve name conflict from multiple
files with the same names?

>

> For __attribute__((annotate("lib=lru"))) the scope could be any number

> of lines in C files between pragma push/pop and can be nested.

> This attribute is supported by clang, but not in the bpf backend.

> The llvm would prefix both global and static names

> in elf file and in btf.


Would there be a way to recover the "lru" part from the mangled ELF somehow?

> If another file.bpf.c needs to call a function from namespace "lru"

> it would need to prefix such a call.

> The skel gen job would be #2 above (emit both static and globals if

> they don't conflict).

> Such namespacing concept would be the closest to c++ namespaces.

>

> If I understood what folks were saying no one is excited about namespaces in C.

> So probably #3 is out and sounds like 1 is prefered?


I think at least allowing for namespaces would be great. Most
languages besides C that will wish to integrate with eBPF allow
namespacing.

--
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com
Andrii Nakryiko May 11, 2021, 5:57 p.m. UTC | #20
On Thu, May 6, 2021 at 3:54 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>

> On 5/5/21 7:22 AM, Alexei Starovoitov wrote:

> > On Mon, May 3, 2021 at 9:42 PM Alexei Starovoitov

> > <alexei.starovoitov@gmail.com> wrote:

> >> On Wed, Apr 28, 2021 at 12:33:36PM -0700, Andrii Nakryiko wrote:

> >>>> At least I'm only starting to grasp the complexity of the problem.

> >>>

> >>> I did and didn't find anything satisfactory. But I think we are coming

> >>> at this from two different angles, which is why we can't agree on

> >>> anything. So just a reminder, static is about two properties:

> >>>      1) access protection

> >>>      2) naming collisions.

> >>>

> >>> I'm trying to let name collisions on BPF side happen and be allowed

> >>> *while* also allowing access to those same name-collisioned entities

> >>> (maps and vars, both) from user-space in some non-random fashion. That

> >>> inevitably requires some compromises/conventions on the user-space

> >>> side. Such an approach preserves both 1) and 2).

> >>>

> >>> You are trying to enforce unique names (or at least aliases) for

> >>> static variables, if I understand correctly, which preserves 1) at the

> >>> expense of 2). It seems to be a similar idea with custom SEC(), though

> >>> you ignored my request to elaborate on how you see that used, so I'm

> >>> guessing here a bit.

> >>>

> >>> But I think we can get just 1) with global variables with custom

> >>> visibilities. E.g., just marking map/variable as __hidden would

> >>> disallow extern'ing it from other files. That's obviously limiting for

> >>> extern'ing within the library, so we can keep digging deeper and

> >>> define __internal (STV_INTERNAL) that would be "upgraded" to

> >>> STV_HIDDEN after the initial linking pass. So you'd compile your BPF

> >>> library with __internal, but your lib.bpf.o will have those global

> >>> variables as STV_HIDDEN and thus inaccessible from other libraries and

> >>> BPF app itself.

> >>>

> >>> So if we are ok breaking existing static variable users, then just

> >>> dropping statics from BPF skeleton and supporting extra __hidden and

> >>> __internal semantics for variables and maps would bypass these issues.

> >>> I wanted statics mostly for property 2), but if I can't get it, then

> >>> I'd drop statics from skeletons altogether.

> >>>

> >>> If I could drop statics for skeletons that were statically linked,

> >>> that wouldn't be a regression. It's impossible to do right now, but we

> >>> can also add a new SHT_NOTE section, which we can use to detect

> >>> statically linked vs Clang-generated .bpf.o. Certainly more ELF

> >>> fussing around than I'd like, but not the end of the world either.

> >>>

> >>> Thoughts? Did that summarize the issue well enough?

> >>

> >> Background for all:

> >>

> >> Until Nov 2019 libbpf didn't support global variables, so bpf programs

> >> contained code like 'static volatile const int var = 1;'

> >> Then the skeleton was introduced which went through BTF of a given

> >> datasec and emitted all variables from that section into .skel.h.

> >> It didn't bother filtering static vs global variables, so

> >> static vars in *.bpf.c world became visible into user space *.c world.

> >> While libbpf supported single bpf.o file such extern-ing of statics

> >> was fine, but with support of linking multiple *.bpf.o there

> >> is a question of what to do with static variables with the same names

> >> in different files.

> >>

> >> Consider the following scenario:

> >> One bpf developer creates a library conntrack. It has

> >> impl.bpf.c

> >> ct_api.bpf.c

> >> and corresponding user space ct.c that uses skel.h to access

> >> data in these two bpf files.

> >>

> >> Another bpf developer creates a library for lru. It has

> >> impl.bpf.c

> >> lru_api.bpf.c

> >> and corresponding user space lru.c.

> >>

> >> Now the 3rd developer is writing its main.bpf.c and wants to use these libs.

> >>

> >> The libs should be usable in pre-compiled form. The availability of

> >> the source code is nice, but it shouldn't be mandatory.

> >>

> >> So there is libct.a (with user space) and libct.bpf.a (with bpf code)

> >> and liblru.a (user) and liblru.bpf.a (bpf code).

> >>

> >> The developer should be able to link

> >> main.bpf.o liblru.bpf.a libct.bpf.a

> >> into final_main.bpf.o

> >> And link main.o liblru.a libct.a with user space bits into a.out.

> >>

> >> The lru.skel.h and ct.skel.h used by these libs were generated

> >> out of corresponding *.bpf.o and independent of each other.

> >> There should be no need to recompile user space lru.c and ct.c after

> >> linking of final_main.bpf.o and generating final skeleton.

> >>

> >> I think all three developers should be able to use static variables

> >> in their .bpf.c files without worrying about conflicts across three

> >> projects.

> >> They can use global vars with __attribute__("hidden"),

> >> but it's not equivalent to static. The linker will complain of

> >> redefinition if the same name is used across multiple files

> >> or multiple libs.

> >> So doing 'int var __attribute__("hidden");' in libct.bpf.a and

> >> in liblru.bpf.a will prevent linking together.

> >> That's traditional static linking semantics.

> >>

> >> Using file name as a prefix for static vars doesn't work in general,

> >> since file names can be the same.

> >> What can work is the library name. The library name is guaranteed to be

> >> unique in the final linking phase.

> >> I think we can use it to namespace static variables across

> >> three sets of bpf programs.

> >> Also I think it's ok to require a single developer to enforce

> >> uniqueness of static vars within a project.

> >>

> >> In other words 'static int a;' in impl.bpf.c will conflict

> >> with 'static int a;' in ct_api.bpf.c

> >> But the static variable in ct_api.bpf.c will not conflict

> >> with the same variable in lru_api.bpf.c and will not conflict

> >> with such var in main.bpf.c because they're in a different namespaces.

> >>

> >> Here are few ways for the programmer to indicate the library namespaces:

> >>

> >> - similar to 'char license[]' use 'char library[]="lru";' in *.bpf.c

> >> The static linker will handle this reserved name specially just like

> >> it does 'license' and 'version'.

> >>

> >> - #pragma clang attribute push (__attribute__((annotate("lib=lru"))), apply_to = variable)

> >>

> >> - #pragma comment(lib, "lru")

> >>

> >> I think it's important to define namespaces within *.bpf.c.

> >> Defining them outside on linker command line or linker script is cumbersome.

> >>

> >> I think combining *.o into .a can happen with traditional 'ar'. No need for

> >> extra checks for now.

> >> The linking of main.bpf.o liblru.bpf.a libct.bpf.a

> >> will fail if static vars with the same name are present within the same library.

> >> The library namespaces will prevent name conflicts across libs and main.bpf.o

> >> If namespace is not specified it means it's empty, so the existing

> >> hacks of 'static volatile const int var;' will continue working.

> >>

> >> The skeleton can have library name as anon struct in skel.h.

> >> All vars can be prefixed too, but scoping them into single struct is cleaner.

> >>

> >> I think it doesn't hurt if final_main.skel.h includes all bpf vars from lru and

> >> ct libraries, but I think it's cleaner to omit them.

> >>

> >> It's not clear to me yet how final_main__open() and final_main__load() skeleton

> >> methods will work since lru and ct libs might need their specific initialization

> >> that is done by user space lru.c and ct.c.

> >> Also the whole scheme should work with upcoming light skeleton too.

> >> The design for bpf libraries should accommodate signed libraries.

> >>

> >> All of the above is up for discussion. I'd love to hear what golang folks

> >> are thinking, since above proposal is C centric.

> >

> > I want to clarify a few things that were brought up in offline discussions.

> > There are several options:

> > 1. don't emit statics at all.

> > That will break some skeleton users and doesn't solve the name conflict issue.

> > The library authors would need to be careful and use a unique enough

> > prefix for all global vars (including attribute("hidden") ones).

> > That's no different with traditional static linking in C.

> > bpf static linker already rejects linking if file1.bpf.c is trying to

> > 'extern int foo()'

> > when it was '__hidden int foo();' in file2.bpf.c

> > That's safer than traditional linker and the same approach can be

> > applied to vars.

> > So externing of __hidden vars won't be possible, but they will name conflict.

> >

> > 2. emit statics when they don't conflict and fail skel gen where there

> > is a naming conflict.

> > That helps a bit, but library authors still have to be careful with

> > both static and global names.

> > Which is more annoying than traditional C.

> >

> > 3. do #2 style of failing skel gen if there is a naming conflict, but

> > also introduce namespacing concept, so that both global and static

> > vars can be automatically namespaced.

> > That's the proposal above.

> > This way, I'm guessing, some libraries will use namespaces to avoid

> > prefixing everything.

> > The folks that hate namespaces and #pragmas will do manual prefixes for

> > both static and global vars.

> >

> > For approaches

> > char library[]="lru";'

> > and

> > #pragma comment(lib, "lru")

> > the scope of namespace is the whole .bpf.c file.

> > The clang/llvm already support it, so the job of name mangling would

> > belong to linker.

> >

> > For __attribute__((annotate("lib=lru"))) the scope could be any number

> > of lines in C files between pragma push/pop and can be nested.

> > This attribute is supported by clang, but not in the bpf backend.

> > The llvm would prefix both global and static names

> > in elf file and in btf.

> > If another file.bpf.c needs to call a function from namespace "lru"

> > it would need to prefix such a call.

> > The skel gen job would be #2 above (emit both static and globals if

> > they don't conflict).

> > Such namespacing concept would be the closest to c++ namespaces.

> >

> > If I understood what folks were saying no one is excited about namespaces in C.

> > So probably #3 is out and sounds like 1 is prefered?

> > So don't emit statics ?

> >

> > Daniel, Lorenz, John,

> >

> > what's your take ?

>

> Hmm, if it wasn't for the breakage, I would be in strong favour of 1), mainly


Neither libbpf-tools ([0]) nor libbpf-bootstrap ([1]) use static
variables. And both seem to be a pretty popular examples of writing
BPF apps that use BPF skeleton. I also searched entire Facebook code
base and found exactly one usage of static variable with BPF skeleton.
That was my own code from before libbpf supported global variables. So
I suspect we won't cause much pain at all by dropping static variables
from skeleton. And in any case, it's easy and mechanical to just
switch to globals without changing anything but 2 words in variable
declaration.

If we needed to be super-cautious, we could detect that BPF object
file was statically linked (as opposed to be straight out of Clang),
but this seems unnecessary for now.

> because it's the most _natural/closest_ to C, so developers wouldn't have to

> worry about statics. Less hidden magic.

>

> Even with 3) it's an unexpected extra step that developers have to be aware of

> for the case of statics at least. Presumably if it's for the whole .bpf.c file

> something like __library(lru) whether pragma or char might be okay given devs

> already have extra annotations like license.

>

> The __attribute__ sounds also okay but needs explanation to users overall. If

> we need to go that road, we probably need to have both, meaning, pragma or char

> /and/ the attribute one. I'm thinking the former mainly so that users don't have

> to worry about stumbling into the statics conflicts later on.

>

> Thanks,

> Daniel
Andrii Nakryiko May 11, 2021, 6:04 p.m. UTC | #21
On Tue, May 11, 2021 at 7:20 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>

> On Wed, 5 May 2021 at 06:22, Alexei Starovoitov

> <alexei.starovoitov@gmail.com> wrote:

> >

> > > All of the above is up for discussion. I'd love to hear what golang folks

> > > are thinking, since above proposal is C centric.

>

> Sorry for the late reply, I was on holiday.

>

> Regarding your conntrack library example:

> - what is the difference between impl.bpf.c and ct_api.bpf.c? If I

> understand correctly, ct_api is used to generate the skel.h, but impl

> isn't?


I don't think it matters much, the point is that your BPF library can
be compiled from multiple .c files. Same for BPF application itself,
it can be compiled from multiple .c files and use multiple BPF
libraries.

> - what file would main.bpf.c include? ct_api or skel.h?


main.bpf.c will include (if at all) anything that BPF library will
provide *for BPF side of things*. I.e., some sort of ct_api.bpf.h.
skel.h is not supposed to be included in BPF code, only user-space.

>

> Regarding Andrii's proposal in the forwarded email to use __hidden,

> __internal etc. Are these correct:

> - static int foo: this is only available in the same .o, not

> accessible from user space. Can be referenced via extern int foo?


Yes about availability only from BPF and only within single .o. Not
true about extern, you can't extern statics (just like in user-space).

> - __hidden int foo: only available in same .o, not accessible from user space

> - __internal int foo: only available in same .a via extern, not

> accessible from user space


See my last RFC patch set for details (last patch especially with more
details in commit log). It's the other way around. __hidden means
available across multiple .o files during static linking. Once static
linking is done (e.g., you compiled BPF library into my_lib.bpf.o),
__hidden is restricted and converted into __internal. __internal means
not available outside of single .o. global __internal symbol is
equivalent to static variable, except it's accessible from BPF
skeleton/sub-skeleton and we have name uniqueness guarantee.

> - int foo: available / conflicts in all .o, accessible from user space

> (aka included in skel.h)


yes, it's a plain global symbol with STV_DEFAULT visibility.

>

> When you speak of the linker, do you mean libbpf or the clang / llvm

> linker? The Go toolchain has a simplistic linker to support bpf2bpf

> calls from the same .o so I imagine libbpf has something similar.


We are talking about libbpf's struct bpf_linker linker.

>

> > I want to clarify a few things that were brought up in offline discussions.

> > There are several options:

> > 1. don't emit statics at all.

> > That will break some skeleton users and doesn't solve the name conflict issue.

> > The library authors would need to be careful and use a unique enough

> > prefix for all global vars (including attribute("hidden") ones).

> > That's no different with traditional static linking in C.

> > bpf static linker already rejects linking if file1.bpf.c is trying to

> > 'extern int foo()'

> > when it was '__hidden int foo();' in file2.bpf.c

> > That's safer than traditional linker and the same approach can be

> > applied to vars.

> > So externing of __hidden vars won't be possible, but they will name conflict.

> >


[...]

>

> --

> Lorenz Bauer  |  Systems Engineer

> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

>

> www.cloudflare.com
Andrii Nakryiko May 11, 2021, 6:05 p.m. UTC | #22
On Tue, May 11, 2021 at 10:57 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>

> On Thu, May 6, 2021 at 3:54 PM Daniel Borkmann <daniel@iogearbox.net> wrote:

> >

> > On 5/5/21 7:22 AM, Alexei Starovoitov wrote:

> > > On Mon, May 3, 2021 at 9:42 PM Alexei Starovoitov

> > > <alexei.starovoitov@gmail.com> wrote:

> > >> On Wed, Apr 28, 2021 at 12:33:36PM -0700, Andrii Nakryiko wrote:

> > >>>> At least I'm only starting to grasp the complexity of the problem.

> > >>>

> > >>> I did and didn't find anything satisfactory. But I think we are coming

> > >>> at this from two different angles, which is why we can't agree on

> > >>> anything. So just a reminder, static is about two properties:

> > >>>      1) access protection

> > >>>      2) naming collisions.

> > >>>

> > >>> I'm trying to let name collisions on BPF side happen and be allowed

> > >>> *while* also allowing access to those same name-collisioned entities

> > >>> (maps and vars, both) from user-space in some non-random fashion. That

> > >>> inevitably requires some compromises/conventions on the user-space

> > >>> side. Such an approach preserves both 1) and 2).

> > >>>

> > >>> You are trying to enforce unique names (or at least aliases) for

> > >>> static variables, if I understand correctly, which preserves 1) at the

> > >>> expense of 2). It seems to be a similar idea with custom SEC(), though

> > >>> you ignored my request to elaborate on how you see that used, so I'm

> > >>> guessing here a bit.

> > >>>

> > >>> But I think we can get just 1) with global variables with custom

> > >>> visibilities. E.g., just marking map/variable as __hidden would

> > >>> disallow extern'ing it from other files. That's obviously limiting for

> > >>> extern'ing within the library, so we can keep digging deeper and

> > >>> define __internal (STV_INTERNAL) that would be "upgraded" to

> > >>> STV_HIDDEN after the initial linking pass. So you'd compile your BPF

> > >>> library with __internal, but your lib.bpf.o will have those global

> > >>> variables as STV_HIDDEN and thus inaccessible from other libraries and

> > >>> BPF app itself.

> > >>>

> > >>> So if we are ok breaking existing static variable users, then just

> > >>> dropping statics from BPF skeleton and supporting extra __hidden and

> > >>> __internal semantics for variables and maps would bypass these issues.

> > >>> I wanted statics mostly for property 2), but if I can't get it, then

> > >>> I'd drop statics from skeletons altogether.

> > >>>

> > >>> If I could drop statics for skeletons that were statically linked,

> > >>> that wouldn't be a regression. It's impossible to do right now, but we

> > >>> can also add a new SHT_NOTE section, which we can use to detect

> > >>> statically linked vs Clang-generated .bpf.o. Certainly more ELF

> > >>> fussing around than I'd like, but not the end of the world either.

> > >>>

> > >>> Thoughts? Did that summarize the issue well enough?

> > >>

> > >> Background for all:

> > >>

> > >> Until Nov 2019 libbpf didn't support global variables, so bpf programs

> > >> contained code like 'static volatile const int var = 1;'

> > >> Then the skeleton was introduced which went through BTF of a given

> > >> datasec and emitted all variables from that section into .skel.h.

> > >> It didn't bother filtering static vs global variables, so

> > >> static vars in *.bpf.c world became visible into user space *.c world.

> > >> While libbpf supported single bpf.o file such extern-ing of statics

> > >> was fine, but with support of linking multiple *.bpf.o there

> > >> is a question of what to do with static variables with the same names

> > >> in different files.

> > >>

> > >> Consider the following scenario:

> > >> One bpf developer creates a library conntrack. It has

> > >> impl.bpf.c

> > >> ct_api.bpf.c

> > >> and corresponding user space ct.c that uses skel.h to access

> > >> data in these two bpf files.

> > >>

> > >> Another bpf developer creates a library for lru. It has

> > >> impl.bpf.c

> > >> lru_api.bpf.c

> > >> and corresponding user space lru.c.

> > >>

> > >> Now the 3rd developer is writing its main.bpf.c and wants to use these libs.

> > >>

> > >> The libs should be usable in pre-compiled form. The availability of

> > >> the source code is nice, but it shouldn't be mandatory.

> > >>

> > >> So there is libct.a (with user space) and libct.bpf.a (with bpf code)

> > >> and liblru.a (user) and liblru.bpf.a (bpf code).

> > >>

> > >> The developer should be able to link

> > >> main.bpf.o liblru.bpf.a libct.bpf.a

> > >> into final_main.bpf.o

> > >> And link main.o liblru.a libct.a with user space bits into a.out.

> > >>

> > >> The lru.skel.h and ct.skel.h used by these libs were generated

> > >> out of corresponding *.bpf.o and independent of each other.

> > >> There should be no need to recompile user space lru.c and ct.c after

> > >> linking of final_main.bpf.o and generating final skeleton.

> > >>

> > >> I think all three developers should be able to use static variables

> > >> in their .bpf.c files without worrying about conflicts across three

> > >> projects.

> > >> They can use global vars with __attribute__("hidden"),

> > >> but it's not equivalent to static. The linker will complain of

> > >> redefinition if the same name is used across multiple files

> > >> or multiple libs.

> > >> So doing 'int var __attribute__("hidden");' in libct.bpf.a and

> > >> in liblru.bpf.a will prevent linking together.

> > >> That's traditional static linking semantics.

> > >>

> > >> Using file name as a prefix for static vars doesn't work in general,

> > >> since file names can be the same.

> > >> What can work is the library name. The library name is guaranteed to be

> > >> unique in the final linking phase.

> > >> I think we can use it to namespace static variables across

> > >> three sets of bpf programs.

> > >> Also I think it's ok to require a single developer to enforce

> > >> uniqueness of static vars within a project.

> > >>

> > >> In other words 'static int a;' in impl.bpf.c will conflict

> > >> with 'static int a;' in ct_api.bpf.c

> > >> But the static variable in ct_api.bpf.c will not conflict

> > >> with the same variable in lru_api.bpf.c and will not conflict

> > >> with such var in main.bpf.c because they're in a different namespaces.

> > >>

> > >> Here are few ways for the programmer to indicate the library namespaces:

> > >>

> > >> - similar to 'char license[]' use 'char library[]="lru";' in *.bpf.c

> > >> The static linker will handle this reserved name specially just like

> > >> it does 'license' and 'version'.

> > >>

> > >> - #pragma clang attribute push (__attribute__((annotate("lib=lru"))), apply_to = variable)

> > >>

> > >> - #pragma comment(lib, "lru")

> > >>

> > >> I think it's important to define namespaces within *.bpf.c.

> > >> Defining them outside on linker command line or linker script is cumbersome.

> > >>

> > >> I think combining *.o into .a can happen with traditional 'ar'. No need for

> > >> extra checks for now.

> > >> The linking of main.bpf.o liblru.bpf.a libct.bpf.a

> > >> will fail if static vars with the same name are present within the same library.

> > >> The library namespaces will prevent name conflicts across libs and main.bpf.o

> > >> If namespace is not specified it means it's empty, so the existing

> > >> hacks of 'static volatile const int var;' will continue working.

> > >>

> > >> The skeleton can have library name as anon struct in skel.h.

> > >> All vars can be prefixed too, but scoping them into single struct is cleaner.

> > >>

> > >> I think it doesn't hurt if final_main.skel.h includes all bpf vars from lru and

> > >> ct libraries, but I think it's cleaner to omit them.

> > >>

> > >> It's not clear to me yet how final_main__open() and final_main__load() skeleton

> > >> methods will work since lru and ct libs might need their specific initialization

> > >> that is done by user space lru.c and ct.c.

> > >> Also the whole scheme should work with upcoming light skeleton too.

> > >> The design for bpf libraries should accommodate signed libraries.

> > >>

> > >> All of the above is up for discussion. I'd love to hear what golang folks

> > >> are thinking, since above proposal is C centric.

> > >

> > > I want to clarify a few things that were brought up in offline discussions.

> > > There are several options:

> > > 1. don't emit statics at all.

> > > That will break some skeleton users and doesn't solve the name conflict issue.

> > > The library authors would need to be careful and use a unique enough

> > > prefix for all global vars (including attribute("hidden") ones).

> > > That's no different with traditional static linking in C.

> > > bpf static linker already rejects linking if file1.bpf.c is trying to

> > > 'extern int foo()'

> > > when it was '__hidden int foo();' in file2.bpf.c

> > > That's safer than traditional linker and the same approach can be

> > > applied to vars.

> > > So externing of __hidden vars won't be possible, but they will name conflict.

> > >

> > > 2. emit statics when they don't conflict and fail skel gen where there

> > > is a naming conflict.

> > > That helps a bit, but library authors still have to be careful with

> > > both static and global names.

> > > Which is more annoying than traditional C.

> > >

> > > 3. do #2 style of failing skel gen if there is a naming conflict, but

> > > also introduce namespacing concept, so that both global and static

> > > vars can be automatically namespaced.

> > > That's the proposal above.

> > > This way, I'm guessing, some libraries will use namespaces to avoid

> > > prefixing everything.

> > > The folks that hate namespaces and #pragmas will do manual prefixes for

> > > both static and global vars.

> > >

> > > For approaches

> > > char library[]="lru";'

> > > and

> > > #pragma comment(lib, "lru")

> > > the scope of namespace is the whole .bpf.c file.

> > > The clang/llvm already support it, so the job of name mangling would

> > > belong to linker.

> > >

> > > For __attribute__((annotate("lib=lru"))) the scope could be any number

> > > of lines in C files between pragma push/pop and can be nested.

> > > This attribute is supported by clang, but not in the bpf backend.

> > > The llvm would prefix both global and static names

> > > in elf file and in btf.

> > > If another file.bpf.c needs to call a function from namespace "lru"

> > > it would need to prefix such a call.

> > > The skel gen job would be #2 above (emit both static and globals if

> > > they don't conflict).

> > > Such namespacing concept would be the closest to c++ namespaces.

> > >

> > > If I understood what folks were saying no one is excited about namespaces in C.

> > > So probably #3 is out and sounds like 1 is prefered?

> > > So don't emit statics ?

> > >

> > > Daniel, Lorenz, John,

> > >

> > > what's your take ?

> >

> > Hmm, if it wasn't for the breakage, I would be in strong favour of 1), mainly

>

> Neither libbpf-tools ([0]) nor libbpf-bootstrap ([1]) use static

> variables. And both seem to be a pretty popular examples of writing

> BPF apps that use BPF skeleton. I also searched entire Facebook code

> base and found exactly one usage of static variable with BPF skeleton.

> That was my own code from before libbpf supported global variables. So

> I suspect we won't cause much pain at all by dropping static variables

> from skeleton. And in any case, it's easy and mechanical to just

> switch to globals without changing anything but 2 words in variable

> declaration.


And of course there should have been these here:

  [0] https://github.com/iovisor/bcc/tree/master/libbpf-tools
  [1] https://github.com/libbpf/libbpf-bootstrap

>

> If we needed to be super-cautious, we could detect that BPF object

> file was statically linked (as opposed to be straight out of Clang),

> but this seems unnecessary for now.

>

> > because it's the most _natural/closest_ to C, so developers wouldn't have to

> > worry about statics. Less hidden magic.

> >

> > Even with 3) it's an unexpected extra step that developers have to be aware of

> > for the case of statics at least. Presumably if it's for the whole .bpf.c file

> > something like __library(lru) whether pragma or char might be okay given devs

> > already have extra annotations like license.

> >

> > The __attribute__ sounds also okay but needs explanation to users overall. If

> > we need to go that road, we probably need to have both, meaning, pragma or char

> > /and/ the attribute one. I'm thinking the former mainly so that users don't have

> > to worry about stumbling into the statics conflicts later on.

> >

> > Thanks,

> > Daniel
Andrii Nakryiko May 11, 2021, 6:59 p.m. UTC | #23
On Tue, May 4, 2021 at 10:22 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Mon, May 3, 2021 at 9:42 PM Alexei Starovoitov

> <alexei.starovoitov@gmail.com> wrote:

> >

> > On Wed, Apr 28, 2021 at 12:33:36PM -0700, Andrii Nakryiko wrote:

> > > > At least I'm only starting to grasp the complexity of the problem.

> > >

> > > I did and didn't find anything satisfactory. But I think we are coming

> > > at this from two different angles, which is why we can't agree on

> > > anything. So just a reminder, static is about two properties:

> > >     1) access protection

> > >     2) naming collisions.

> > >

> > > I'm trying to let name collisions on BPF side happen and be allowed

> > > *while* also allowing access to those same name-collisioned entities

> > > (maps and vars, both) from user-space in some non-random fashion. That

> > > inevitably requires some compromises/conventions on the user-space

> > > side. Such an approach preserves both 1) and 2).

> > >

> > > You are trying to enforce unique names (or at least aliases) for

> > > static variables, if I understand correctly, which preserves 1) at the

> > > expense of 2). It seems to be a similar idea with custom SEC(), though

> > > you ignored my request to elaborate on how you see that used, so I'm

> > > guessing here a bit.

> > >

> > > But I think we can get just 1) with global variables with custom

> > > visibilities. E.g., just marking map/variable as __hidden would

> > > disallow extern'ing it from other files. That's obviously limiting for

> > > extern'ing within the library, so we can keep digging deeper and

> > > define __internal (STV_INTERNAL) that would be "upgraded" to

> > > STV_HIDDEN after the initial linking pass. So you'd compile your BPF

> > > library with __internal, but your lib.bpf.o will have those global

> > > variables as STV_HIDDEN and thus inaccessible from other libraries and

> > > BPF app itself.

> > >

> > > So if we are ok breaking existing static variable users, then just

> > > dropping statics from BPF skeleton and supporting extra __hidden and

> > > __internal semantics for variables and maps would bypass these issues.

> > > I wanted statics mostly for property 2), but if I can't get it, then

> > > I'd drop statics from skeletons altogether.

> > >

> > > If I could drop statics for skeletons that were statically linked,

> > > that wouldn't be a regression. It's impossible to do right now, but we

> > > can also add a new SHT_NOTE section, which we can use to detect

> > > statically linked vs Clang-generated .bpf.o. Certainly more ELF

> > > fussing around than I'd like, but not the end of the world either.

> > >

> > > Thoughts? Did that summarize the issue well enough?

> >

> > Background for all:

> >

> > Until Nov 2019 libbpf didn't support global variables, so bpf programs

> > contained code like 'static volatile const int var = 1;'

> > Then the skeleton was introduced which went through BTF of a given

> > datasec and emitted all variables from that section into .skel.h.

> > It didn't bother filtering static vs global variables, so

> > static vars in *.bpf.c world became visible into user space *.c world.

> > While libbpf supported single bpf.o file such extern-ing of statics

> > was fine, but with support of linking multiple *.bpf.o there

> > is a question of what to do with static variables with the same names

> > in different files.

> >

> > Consider the following scenario:

> > One bpf developer creates a library conntrack. It has

> > impl.bpf.c

> > ct_api.bpf.c

> > and corresponding user space ct.c that uses skel.h to access

> > data in these two bpf files.

> >

> > Another bpf developer creates a library for lru. It has

> > impl.bpf.c

> > lru_api.bpf.c

> > and corresponding user space lru.c.

> >

> > Now the 3rd developer is writing its main.bpf.c and wants to use these libs.

> >

> > The libs should be usable in pre-compiled form. The availability of

> > the source code is nice, but it shouldn't be mandatory.

> >

> > So there is libct.a (with user space) and libct.bpf.a (with bpf code)

> > and liblru.a (user) and liblru.bpf.a (bpf code).

> >

> > The developer should be able to link

> > main.bpf.o liblru.bpf.a libct.bpf.a

> > into final_main.bpf.o

> > And link main.o liblru.a libct.a with user space bits into a.out.

> >

> > The lru.skel.h and ct.skel.h used by these libs were generated

> > out of corresponding *.bpf.o and independent of each other.

> > There should be no need to recompile user space lru.c and ct.c after

> > linking of final_main.bpf.o and generating final skeleton.

> >

> > I think all three developers should be able to use static variables

> > in their .bpf.c files without worrying about conflicts across three

> > projects.

> > They can use global vars with __attribute__("hidden"),

> > but it's not equivalent to static. The linker will complain of

> > redefinition if the same name is used across multiple files

> > or multiple libs.

> > So doing 'int var __attribute__("hidden");' in libct.bpf.a and

> > in liblru.bpf.a will prevent linking together.

> > That's traditional static linking semantics.

> >

> > Using file name as a prefix for static vars doesn't work in general,

> > since file names can be the same.

> > What can work is the library name. The library name is guaranteed to be

> > unique in the final linking phase.

> > I think we can use it to namespace static variables across

> > three sets of bpf programs.

> > Also I think it's ok to require a single developer to enforce

> > uniqueness of static vars within a project.

> >

> > In other words 'static int a;' in impl.bpf.c will conflict

> > with 'static int a;' in ct_api.bpf.c

> > But the static variable in ct_api.bpf.c will not conflict

> > with the same variable in lru_api.bpf.c and will not conflict

> > with such var in main.bpf.c because they're in a different namespaces.

> >

> > Here are few ways for the programmer to indicate the library namespaces:

> >

> > - similar to 'char license[]' use 'char library[]="lru";' in *.bpf.c

> > The static linker will handle this reserved name specially just like

> > it does 'license' and 'version'.

> >

> > - #pragma clang attribute push (__attribute__((annotate("lib=lru"))), apply_to = variable)

> >

> > - #pragma comment(lib, "lru")

> >

> > I think it's important to define namespaces within *.bpf.c.

> > Defining them outside on linker command line or linker script is cumbersome.

> >

> > I think combining *.o into .a can happen with traditional 'ar'. No need for

> > extra checks for now.

> > The linking of main.bpf.o liblru.bpf.a libct.bpf.a

> > will fail if static vars with the same name are present within the same library.

> > The library namespaces will prevent name conflicts across libs and main.bpf.o

> > If namespace is not specified it means it's empty, so the existing

> > hacks of 'static volatile const int var;' will continue working.

> >

> > The skeleton can have library name as anon struct in skel.h.

> > All vars can be prefixed too, but scoping them into single struct is cleaner.

> >

> > I think it doesn't hurt if final_main.skel.h includes all bpf vars from lru and

> > ct libraries, but I think it's cleaner to omit them.

> >

> > It's not clear to me yet how final_main__open() and final_main__load() skeleton

> > methods will work since lru and ct libs might need their specific initialization

> > that is done by user space lru.c and ct.c.

> > Also the whole scheme should work with upcoming light skeleton too.

> > The design for bpf libraries should accommodate signed libraries.

> >

> > All of the above is up for discussion. I'd love to hear what golang folks

> > are thinking, since above proposal is C centric.

>

> I want to clarify a few things that were brought up in offline discussions.

> There are several options:

> 1. don't emit statics at all.

> That will break some skeleton users and doesn't solve the name conflict issue.

> The library authors would need to be careful and use a unique enough

> prefix for all global vars (including attribute("hidden") ones).

> That's no different with traditional static linking in C.

> bpf static linker already rejects linking if file1.bpf.c is trying to

> 'extern int foo()'

> when it was '__hidden int foo();' in file2.bpf.c

> That's safer than traditional linker and the same approach can be

> applied to vars.

> So externing of __hidden vars won't be possible, but they will name conflict.

>

> 2. emit statics when they don't conflict and fail skel gen where there

> is a naming conflict.

> That helps a bit, but library authors still have to be careful with

> both static and global names.

> Which is more annoying than traditional C.

>

> 3. do #2 style of failing skel gen if there is a naming conflict, but

> also introduce namespacing concept, so that both global and static

> vars can be automatically namespaced.

> That's the proposal above.

> This way, I'm guessing, some libraries will use namespaces to avoid

> prefixing everything.

> The folks that hate namespaces and #pragmas will do manual prefixes for

> both static and global vars.

>

> For approaches

> char library[]="lru";'

> and

> #pragma comment(lib, "lru")

> the scope of namespace is the whole .bpf.c file.

> The clang/llvm already support it, so the job of name mangling would

> belong to linker.

>

> For __attribute__((annotate("lib=lru"))) the scope could be any number

> of lines in C files between pragma push/pop and can be nested.

> This attribute is supported by clang, but not in the bpf backend.

> The llvm would prefix both global and static names

> in elf file and in btf.

> If another file.bpf.c needs to call a function from namespace "lru"

> it would need to prefix such a call.

> The skel gen job would be #2 above (emit both static and globals if

> they don't conflict).

> Such namespacing concept would be the closest to c++ namespaces.

>

> If I understood what folks were saying no one is excited about namespaces in C.

> So probably #3 is out and sounds like 1 is prefered?

> So don't emit statics ?

>


I'm in favor of not emitting statics. They just seem to cause more
problems than providing any extra benefits. Given it's trivial to just
use globals instead and global vs static is already an explicit signal
of what has to be in BPF skeleton and what's not. See my RFC about
__internal + __hidden semantics, but even if we supported nothing but
STV_DEFAULT globals wouldn't be horrible. Clearly we'd expect users to
just not mess with BPF library's internal state with not way to
enforce that, so I'd still like to have some enforceability.

And keep in mind, BPF code can still use static variables freely, they
will just be BPF-side-only.

Now as for namespacing. Without statics being exposed there doesn't
seem to be any need for namespacing, as globals are supposed to be in
a global namespace with no name mangling and we rely on that guarantee
for BPF skeletons and libbpf APIs that perform lookups by name. But
there is another reason for having a library identifier, though, and
I'll get to it in a second. But before that, between pragma,
annotation and special variable I'd vote for special variable because
  - it's kind of familiar mechanism (like, license and version);
  - it has clear file-wide semantics;
  - as opposed to annotation and pragma, there is a clear mechanism of
preserving such data in ELF, so that linker and/or BPF skeleton
generator can access them.

Of course all of those issues can be overcome, but if we already have
the C/ELF mechanism, I don't see much point in inventing something
new.

Now back to why we might want/need library identifier. Shortly, for
simpler (especially light ) sub-skeleton support.

My original approach for sub-skeleton was to keep BPF library's BTF
information to later (at runtime) find location of .data, .rodata,
.bss data sections (and potentially more). It would work reliably and
would be pretty simple to support through libbpf, because all of the
supporting code is already there in libbpf.

But this is hugely inconvenient for lightweight sub-skeleton, because
it doesn't have libbpf APIs at its disposal.

So my proposal is to allow having a special "library identifier"
variable, e.g., something like:

SEC(".lib") static char[] lib_name = "my_fancy_lib"; (let's not
bikeshed naming for now)

There are complications in how this lib_name is defined across all
files that form the BPF library. It's a bit too obscure to go into
details right now, though. But it feels like need two separate linking
modes: one for linking BPF library and one for a BPF application. But
let's assume for now that we resolve that somehow.

With such library identifiers, BPF static linker will:
  1) enforce uniqueness of library names when linking together
multiple libraries
  2) append zero-size markers to the very beginning and the very end
of each BPF library's DATASECS, something like

DATASEC .data:

   void *___my_fancy_lib__start[0];
   /* here goes .data variables */
   void *___my_fancy_lib__end[0];

And so on for each DATASEC. What those markers provide? Two things:

1). It makes it much easier for sub-skeleton to find where a
particular BPF library's .data/.rodata/.bss starts within the final
BPF application's  .data/.rodata/.bss section. All that without
storing local BTF and doing type comparisons. Only a simple loop over
DATASECs and its variables is needed.

2). (optionally) we can exclude everything between ___<libname>__start
and ___<libname>__end from BPF application's skeleton. It's optional
because I can see cases where BPF application might prefer to have its
own "internal BPF libraries'" state in the final BPF skeleton, instead
of keeping track of many small BPF library sub-skeletons. But for
"external" BPF libraries, it would eliminate accidental corruption of
BPF library state.

I'd actually hold off on implementing #2, because as I said it's not
always clear what's best. But at least that would give us a reliable
and clear mechanism.

It's a pretty long email already and there are a lot things to unpack,
and still more nuances to discuss. So I'll put up BPF static linking +
BPF libraries topic for this week's BPF office hours for anyone
interested to join the live discussion. It would hopefully allow
everyone to get more up to speed and onto the same page on this topic.
But still curious WDYT?


> Daniel, Lorenz, John,

>

> what's your take ?
Alexei Starovoitov May 11, 2021, 11:05 p.m. UTC | #24
On Tue, May 11, 2021 at 11:59:01AM -0700, Andrii Nakryiko wrote:
> >

> > If I understood what folks were saying no one is excited about namespaces in C.

> > So probably #3 is out and sounds like 1 is prefered?

> > So don't emit statics ?

> >

> 

> I'm in favor of not emitting statics. They just seem to cause more

> problems than providing any extra benefits. Given it's trivial to just

> use globals instead and global vs static is already an explicit signal

> of what has to be in BPF skeleton and what's not. See my RFC about

> __internal + __hidden semantics, but even if we supported nothing but

> STV_DEFAULT globals wouldn't be horrible. Clearly we'd expect users to

> just not mess with BPF library's internal state with not way to

> enforce that, so I'd still like to have some enforceability.


I'm glad that with Daniel and Lorenz we have a strong consensus here.
So I've applied first 6 patches from your RFC that stop exposing static
in skeleton and fix tests.
I'm only not 100% sure whether
commit 31332ccb7562 ("bpftool: Stop emitting static variables in BPF skeleton")
is enough to skip them reliably.
I think 'char __pad[N]' logic would do the rigth thing when
statics and globals are interleaved, but would be good to have a test.
In one .o file clang emits all globals first and then all statics,
so even without __pad it would have been enough,
but I don't know how .o-s with statics+global look after static linker combines them.

I skipped patch 7, since without llvm part it cannot be used and it's
not clear yet whether llvm will ever be able to emit __internal.

> So my proposal is to allow having a special "library identifier"

> variable, e.g., something like:

> 

> SEC(".lib") static char[] lib_name = "my_fancy_lib"; (let's not

> bikeshed naming for now)


without 'static' probably? since license and version are without?

and at will be optional (mostly ignored by toolchain) for libs that
don't need sub-skeleton and mandatory for sub-skeleton?

> With such library identifiers, BPF static linker will:

>   1) enforce uniqueness of library names when linking together

> multiple libraries


you're not proposing for lib name to do namespacing of globals, right?
Only to indicate that liblru.o and libct.o (as normal elf files)
are bpf libraries as can be seen in their 'lib_name' strings
instead of regular .o files.
(that would be a definition of bpf library).
So linker can rely on explicit library names given by users in .bpf.c
(and corresponding dependency on sub-skel) instead of relying
on file names?
If so, I agree that it's necessary.
Such 'char lib_name[]' is probably better than elf note.

>   2) append zero-size markers to the very beginning and the very end

> of each BPF library's DATASECS, something like

> 

> DATASEC .data:

> 

>    void *___my_fancy_lib__start[0];

>    /* here goes .data variables */

>    void *___my_fancy_lib__end[0];

> 

> And so on for each DATASEC. What those markers provide? Two things:

> 

> 1). It makes it much easier for sub-skeleton to find where a

> particular BPF library's .data/.rodata/.bss starts within the final

> BPF application's  .data/.rodata/.bss section. All that without

> storing local BTF and doing type comparisons. Only a simple loop over

> DATASECs and its variables is needed.


indeed. some lib name or rather sub-skeleton name is needed.
Since progs can have extern funcs in the lib I see no clean way to
reliably split prog loading between main skeleton and sub-skeletons.
Meaning that prog loading and map creation can only be done
by the main skeleton.
After that is done and mmap-ing of data/rodata/bss is done
the main skeleton will init sub-skeleton with offsets to their
corresponding data based on these offsets?
I think that will work for light skel.
I don't see a use case for __end marker yet, but I guess it's good
for consistency.
rodata init is tricky.
Since the main skel and sub-skels will init their parts independently.
But I think it can be managed.

> 2). (optionally) we can exclude everything between ___<libname>__start

> and ___<libname>__end from BPF application's skeleton.


So that's leaning towards namespacing ideas?
The lib_name doesn't hide any names and globals will conflict during
the linking as usual.
But with this optional hiding (inside .bpf.c it will have special name?)
the partial namespacing can happen. And the lib can hide the stuff
from its users.
The concept is nice, but lib scope maybe too big.

> It's a pretty long email already and there are a lot things to unpack,

> and still more nuances to discuss. So I'll put up BPF static linking +

> BPF libraries topic for this week's BPF office hours for anyone

> interested to join the live discussion. It would hopefully allow

> everyone to get more up to speed and onto the same page on this topic.

> But still curious WDYT?


Sounds great to me. I hope more folks can join the discussion.
Lorenz Bauer May 12, 2021, 1:40 p.m. UTC | #25
On Wed, 12 May 2021 at 00:05, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

>

> > It's a pretty long email already and there are a lot things to unpack,

> > and still more nuances to discuss. So I'll put up BPF static linking +

> > BPF libraries topic for this week's BPF office hours for anyone

> > interested to join the live discussion. It would hopefully allow

> > everyone to get more up to speed and onto the same page on this topic.

> > But still curious WDYT?

>

> Sounds great to me. I hope more folks can join the discussion.


I'll be there, but can only stay for the first half hour. Talk to you then!

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com
Andrii Nakryiko May 12, 2021, 6:50 p.m. UTC | #26
On Tue, May 11, 2021 at 4:05 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Tue, May 11, 2021 at 11:59:01AM -0700, Andrii Nakryiko wrote:

> > >

> > > If I understood what folks were saying no one is excited about namespaces in C.

> > > So probably #3 is out and sounds like 1 is prefered?

> > > So don't emit statics ?

> > >

> >

> > I'm in favor of not emitting statics. They just seem to cause more

> > problems than providing any extra benefits. Given it's trivial to just

> > use globals instead and global vs static is already an explicit signal

> > of what has to be in BPF skeleton and what's not. See my RFC about

> > __internal + __hidden semantics, but even if we supported nothing but

> > STV_DEFAULT globals wouldn't be horrible. Clearly we'd expect users to

> > just not mess with BPF library's internal state with not way to

> > enforce that, so I'd still like to have some enforceability.

>

> I'm glad that with Daniel and Lorenz we have a strong consensus here.

> So I've applied first 6 patches from your RFC that stop exposing static

> in skeleton and fix tests.

> I'm only not 100% sure whether

> commit 31332ccb7562 ("bpftool: Stop emitting static variables in BPF skeleton")

> is enough to skip them reliably.

> I think 'char __pad[N]' logic would do the rigth thing when

> statics and globals are interleaved, but would be good to have a test.


Yeah, it should because current offset is re-calculated based on last
processed variable and its size, so it doesn't matter if we skip extra
variables. We can test it in test_static_linked selftest, if we switch
zero-initialized var1 and var2 to be non-zero initialized, so that
they share .data with static variables. I tested locally and they
work, I'll send an update to selftests with a follow up patch set.

> In one .o file clang emits all globals first and then all statics,

> so even without __pad it would have been enough,

> but I don't know how .o-s with statics+global look after static linker combines them.


Static linker preserves relative order across .o's, so you'll have
globals1, statics1, globals2, statics2, which is why proposed above
adjustment to test_static_linked will validate the logic.

>

> I skipped patch 7, since without llvm part it cannot be used and it's

> not clear yet whether llvm will ever be able to emit __internal.


__internal cannot be used explicitly to isolate global variable from
other files, but STV_HIDDEN -> STV_INTERNAL translation would allow to
protect BPF library globals from BPF application, which is still a
desirable feature. But whatever, we can wait and see the resolution of
the Clang bug/feature discussion.

>

> > So my proposal is to allow having a special "library identifier"

> > variable, e.g., something like:

> >

> > SEC(".lib") static char[] lib_name = "my_fancy_lib"; (let's not

> > bikeshed naming for now)

>

> without 'static' probably? since license and version are without?


It's not so clear. static allows to have different library names for
different files. Currently we enforce that version and license
contents match. It's part of what I said earlier that it feels like we
need two separate linking commands: one for building BPF libraries and
one for linking BPF applications. Which is not that far from
user-space, where you linked shared libraries with a special options.
We just want BPF static libraries to have properties of user-space BPF
shared libraries (w.r.t. protection at least). We can discuss it at
office hours, though.

>

> and at will be optional (mostly ignored by toolchain) for libs that

> don't need sub-skeleton and mandatory for sub-skeleton?


yep

>

> > With such library identifiers, BPF static linker will:

> >   1) enforce uniqueness of library names when linking together

> > multiple libraries

>

> you're not proposing for lib name to do namespacing of globals, right?

> Only to indicate that liblru.o and libct.o (as normal elf files)

> are bpf libraries as can be seen in their 'lib_name' strings

> instead of regular .o files.

> (that would be a definition of bpf library).


yes, globals are not namespaced

> So linker can rely on explicit library names given by users in .bpf.c

> (and corresponding dependency on sub-skel) instead of relying

> on file names?


yes, stable unique identifier

> If so, I agree that it's necessary.

> Such 'char lib_name[]' is probably better than elf note.

>

> >   2) append zero-size markers to the very beginning and the very end

> > of each BPF library's DATASECS, something like

> >

> > DATASEC .data:

> >

> >    void *___my_fancy_lib__start[0];

> >    /* here goes .data variables */

> >    void *___my_fancy_lib__end[0];

> >

> > And so on for each DATASEC. What those markers provide? Two things:

> >

> > 1). It makes it much easier for sub-skeleton to find where a

> > particular BPF library's .data/.rodata/.bss starts within the final

> > BPF application's  .data/.rodata/.bss section. All that without

> > storing local BTF and doing type comparisons. Only a simple loop over

> > DATASECs and its variables is needed.

>

> indeed. some lib name or rather sub-skeleton name is needed.

> Since progs can have extern funcs in the lib I see no clean way to

> reliably split prog loading between main skeleton and sub-skeletons.

> Meaning that prog loading and map creation can only be done

> by the main skeleton.


Yes, opening and loading can be done by main skeleton only. But
between opening and loading a sub-skeleton can be instantiated from
bpf_object (or whatever equivalent way for lskel) and further
pre-setup stuff (map definition adjustments, rodata and other vars
setting up, etc). Then main skeleton loads, and if necessary,
user-space part of BPF library can still perform after-load
adjustments before final attachment happens.

> After that is done and mmap-ing of data/rodata/bss is done

> the main skeleton will init sub-skeleton with offsets to their

> corresponding data based on these offsets?

> I think that will work for light skel.


What I had in mind kept skeleton completely isolated from
sub-skeleton. Think about this, when BPF library author is compiling
it's user-space parts that use sub-skeleton, they don't and generally
speaking can't know anything about the final BPF application, so they
can't have any access to the final skeleton. But they need
code-generated sub-skeleton header file, similarly to BPF skeleton
today. So at least for BPF skeleton, the flow I was imagining would be
like this.

1. BPF library abc consists of abc1.bpf.c and abc2.bpf.c. It also has
user-space component in abc.c.
2. BPF app uses abs library and has its own app1.bpf.c and app2.bpf.c
and app.c for user-space.
3. BPF library author sets up its Makefile to do
  a. clang -target bpf -g -O2 -c abc1.bpf.c -o abc1.bpf.o
  b. clang -target bpf -g -O2 -c abc2.bpf.c -o abc2.bpf.o
  c. bpftool gen lib libabc.bpf.o abc1.bpf.o abc2.bpf.o
  d. bpftool gen subskeleton libabc.bpf.o > libabc.subskel.h
  e. abc.c (user-space library) is of the form

#include "libabc.subskel.h"

static struct libabc_bpf *subskel;

int libabc__init(struct bpf_object *obj)
{
    subskel = libabc_bpf__open_subskel(obj);

    subskel->data->abc_my_var = 123;
}

int libabc__attach()
{
    libabc_bpf__attach(subskel);
}

  f. cc abc.c into libabc.a and then libabc.a and libabc.bpf.o are
distributed to end user

3. Now, from BPF application author side:
  a. clang -target bpf -g -O2 -c app1.bpf.c -o app1.bpf.o
  b. clang -target bpf -g -O2 -c app2.bpf.c -o app2.bpf.o
  c. bpftool gen object app.bpf.o app1.bpf.o app2.bpf.o libabc.bpf.o
  d. on user-space side of app in app.c

#include "app.skel.h"

int main()
{
    struct app_bpf *skel;

    skel = app_bpf__open();
    skel->rodata->app_var = 123;

    libabc__init(skel->obj);

    app_bpf__load(skel);

    libabc__attach();

    /* probably shouldn't auto-attach library progs, but don't know
yet how to prevent that */
    app_bpf__attach(skel);

    /* do some useful logic now */
}

  e. cc app.c -o app && sudo ./app


So, app author doesn't need and doesn't have direct access to
subskeleton header. And sub-skeleton header is generated by BPF
library way before the library is linked into the final application.

> I don't see a use case for __end marker yet, but I guess it's good

> for consistency.


Good for knowing whole range of BPF library's data. Can be used for
some extra sanity checking. And can be used to omit parts of variables
from the final skeleton.

> rodata init is tricky.

> Since the main skel and sub-skels will init their parts independently.

> But I think it can be managed.


See above, I showed rodata initialization as well. It works because we
have mmap()'ed memory right after skeleton's open and all the memory
addresses stay fixed.

>

> > 2). (optionally) we can exclude everything between ___<libname>__start

> > and ___<libname>__end from BPF application's skeleton.

>

> So that's leaning towards namespacing ideas?


Not really, because globals are not namespaced and statics are not
exposed to user-space, so there is no need for namespacing. This is to
find data location and, if necessary, filter out library state from
application skeleton.

> The lib_name doesn't hide any names and globals will conflict during

> the linking as usual.

> But with this optional hiding (inside .bpf.c it will have special name?)

> the partial namespacing can happen. And the lib can hide the stuff

> from its users.

> The concept is nice, but lib scope maybe too big.

>

> > It's a pretty long email already and there are a lot things to unpack,

> > and still more nuances to discuss. So I'll put up BPF static linking +

> > BPF libraries topic for this week's BPF office hours for anyone

> > interested to join the live discussion. It would hopefully allow

> > everyone to get more up to speed and onto the same page on this topic.

> > But still curious WDYT?

>

> Sounds great to me. I hope more folks can join the discussion.
Alexei Starovoitov May 12, 2021, 11:39 p.m. UTC | #27
On Wed, May 12, 2021 at 11:50:19AM -0700, Andrii Nakryiko wrote:
> 
> It's not so clear. static allows to have different library names for
> different files. Currently we enforce that version and license
> contents match. It's part of what I said earlier that it feels like we
> need two separate linking commands: one for building BPF libraries and
> one for linking BPF applications. Which is not that far from
> user-space, where you linked shared libraries with a special options.
> We just want BPF static libraries to have properties of user-space BPF
> shared libraries (w.r.t. protection at least). We can discuss it at
> office hours, though.

If we're saying "no" to .a archives (which is user space definition
of static library) then we can reuse the name "BPF static library"
to mean linked .o that is intermediate step towards bpf application .o.
We also need to distinguish BPF static and dynamic libraries.
The dynamic libs would be already loaded in the kernel.
They will be seen by the kernel as partially verified bpf programs.
We can support both global and static style of
the verification for such dynamic libs. The global entry functions
will be verified as global funcs and static funcs can be loaded
without verification if they're not called.
The static funcs wouldn't be static in C file, of course,
since we've put a stop on static visibility.
They would probably need to be global __hidden similar to what
we already do in libbpf with static linking.
The rules we pick should be consistent for dynamic and static libs.
The workflow of loading bpf dynamic library into the kernel and using
it from the application can be made to look very similar to
using bpf static library.

> > After that is done and mmap-ing of data/rodata/bss is done
> > the main skeleton will init sub-skeleton with offsets to their
> > corresponding data based on these offsets?
> > I think that will work for light skel.
> 
> What I had in mind kept skeleton completely isolated from
> sub-skeleton. Think about this, when BPF library author is compiling
> it's user-space parts that use sub-skeleton, they don't and generally
> speaking can't know anything about the final BPF application, so they
> can't have any access to the final skeleton. But they need
> code-generated sub-skeleton header file, similarly to BPF skeleton
> today. So at least for BPF skeleton, the flow I was imagining would be
> like this.
> 
> 1. BPF library abc consists of abc1.bpf.c and abc2.bpf.c. It also has
> user-space component in abc.c.
> 2. BPF app uses abs library and has its own app1.bpf.c and app2.bpf.c
> and app.c for user-space.
> 3. BPF library author sets up its Makefile to do
>   a. clang -target bpf -g -O2 -c abc1.bpf.c -o abc1.bpf.o
>   b. clang -target bpf -g -O2 -c abc2.bpf.c -o abc2.bpf.o
>   c. bpftool gen lib libabc.bpf.o abc1.bpf.o abc2.bpf.o
>   d. bpftool gen subskeleton libabc.bpf.o > libabc.subskel.h
>   e. abc.c (user-space library) is of the form
> 
> #include "libabc.subskel.h"
> 
> static struct libabc_bpf *subskel;
> 
> int libabc__init(struct bpf_object *obj)
> {
>     subskel = libabc_bpf__open_subskel(obj);

right. I was thinking the same for lskel except
there is no 'bpf_object'.
Either subskel_open will receive already adjusted addresses
from the main skel or they will be grouped into aux struct.

>     subskel->data->abc_my_var = 123;

and then library's custom init can do exactly this line.

> }
> 
> int libabc__attach()
> {
>     libabc_bpf__attach(subskel);
> }
> 
>   f. cc abc.c into libabc.a and then libabc.a and libabc.bpf.o are
> distributed to end user
> 
> 3. Now, from BPF application author side:
>   a. clang -target bpf -g -O2 -c app1.bpf.c -o app1.bpf.o
>   b. clang -target bpf -g -O2 -c app2.bpf.c -o app2.bpf.o
>   c. bpftool gen object app.bpf.o app1.bpf.o app2.bpf.o libabc.bpf.o
>   d. on user-space side of app in app.c
> 
> #include "app.skel.h"
> 
> int main()
> {
>     struct app_bpf *skel;
> 
>     skel = app_bpf__open();
>     skel->rodata->app_var = 123;
> 
>     libabc__init(skel->obj);
> 
>     app_bpf__load(skel);
> 
>     libabc__attach();
> 
>     /* probably shouldn't auto-attach library progs, but don't know
> yet how to prevent that */
>     app_bpf__attach(skel);
> 
>     /* do some useful logic now */
> }
> 
>   e. cc app.c -o app && sudo ./app

right. That's a necessary workflow.

> So, app author doesn't need and doesn't have direct access to
> subskeleton header. And sub-skeleton header is generated by BPF
> library way before the library is linked into the final application.

right. We certainly need that for dynamic and static bpf libs.
Lorenz Bauer May 13, 2021, 8:37 a.m. UTC | #28
On Wed, 12 May 2021 at 19:50, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>


...

> So at least for BPF skeleton, the flow I was imagining would be

> like this.


Thank you for the worked out example, it's really helpful.

>

> 1. BPF library abc consists of abc1.bpf.c and abc2.bpf.c. It also has

> user-space component in abc.c.

> 2. BPF app uses abs library and has its own app1.bpf.c and app2.bpf.c

> and app.c for user-space.

> 3. BPF library author sets up its Makefile to do

>   a. clang -target bpf -g -O2 -c abc1.bpf.c -o abc1.bpf.o

>   b. clang -target bpf -g -O2 -c abc2.bpf.c -o abc2.bpf.o

>   c. bpftool gen lib libabc.bpf.o abc1.bpf.o abc2.bpf.o


I think we can plug this into bpf2go [1] on our side in the best case,
which would avoid duplicating the static linker.

>   d. bpftool gen subskeleton libabc.bpf.o > libabc.subskel.h

>   e. abc.c (user-space library) is of the form

>

> #include "libabc.subskel.h"

>

> static struct libabc_bpf *subskel;

>

> int libabc__init(struct bpf_object *obj)

> {

>     subskel = libabc_bpf__open_subskel(obj);

>

>     subskel->data->abc_my_var = 123;

> }

>

> int libabc__attach()

> {

>     libabc_bpf__attach(subskel);

> }

>

>   f. cc abc.c into libabc.a and then libabc.a and libabc.bpf.o are

> distributed to end user

>

> 3. Now, from BPF application author side:

>   a. clang -target bpf -g -O2 -c app1.bpf.c -o app1.bpf.o

>   b. clang -target bpf -g -O2 -c app2.bpf.c -o app2.bpf.o

>   c. bpftool gen object app.bpf.o app1.bpf.o app2.bpf.o libabc.bpf.o


I haven't worked out exactly how things would work, but on the Go side
it might be possible to distribute libabc.bpf.o plus the Go "library"
code as a package. So the Go toolchain would never create this merged
object, but instead do

    bpftool gen object app.bpf.o app1.bpf.o app2.bpf.o

and later link app.bpf.o and libabc.bpf.o at runtime. It would be
simpler from our side if bpftool gen object could link both libraries
and "programs", maybe we can discuss the details of this during office
hours.

1: https://pkg.go.dev/github.com/cilium/ebpf/cmd/bpf2go

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com
Alexei Starovoitov May 13, 2021, 3:41 p.m. UTC | #29
On Thu, May 13, 2021 at 09:37:13AM +0100, Lorenz Bauer wrote:
> On Wed, 12 May 2021 at 19:50, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> >

> 

> ...

> 

> > So at least for BPF skeleton, the flow I was imagining would be

> > like this.

> 

> Thank you for the worked out example, it's really helpful.

> 

> >

> > 1. BPF library abc consists of abc1.bpf.c and abc2.bpf.c. It also has

> > user-space component in abc.c.

> > 2. BPF app uses abs library and has its own app1.bpf.c and app2.bpf.c

> > and app.c for user-space.

> > 3. BPF library author sets up its Makefile to do

> >   a. clang -target bpf -g -O2 -c abc1.bpf.c -o abc1.bpf.o

> >   b. clang -target bpf -g -O2 -c abc2.bpf.c -o abc2.bpf.o

> >   c. bpftool gen lib libabc.bpf.o abc1.bpf.o abc2.bpf.o

> 

> I think we can plug this into bpf2go [1] on our side in the best case,

> which would avoid duplicating the static linker.

> 

> >   d. bpftool gen subskeleton libabc.bpf.o > libabc.subskel.h

> >   e. abc.c (user-space library) is of the form

> >

> > #include "libabc.subskel.h"

> >

> > static struct libabc_bpf *subskel;

> >

> > int libabc__init(struct bpf_object *obj)

> > {

> >     subskel = libabc_bpf__open_subskel(obj);

> >

> >     subskel->data->abc_my_var = 123;

> > }

> >

> > int libabc__attach()

> > {

> >     libabc_bpf__attach(subskel);

> > }

> >

> >   f. cc abc.c into libabc.a and then libabc.a and libabc.bpf.o are

> > distributed to end user

> >

> > 3. Now, from BPF application author side:

> >   a. clang -target bpf -g -O2 -c app1.bpf.c -o app1.bpf.o

> >   b. clang -target bpf -g -O2 -c app2.bpf.c -o app2.bpf.o

> >   c. bpftool gen object app.bpf.o app1.bpf.o app2.bpf.o libabc.bpf.o

> 

> I haven't worked out exactly how things would work, but on the Go side

> it might be possible to distribute libabc.bpf.o plus the Go "library"

> code as a package. So the Go toolchain would never create this merged

> object, but instead do

> 

>     bpftool gen object app.bpf.o app1.bpf.o app2.bpf.o

> 

> and later link app.bpf.o and libabc.bpf.o at runtime. It would be

> simpler from our side if bpftool gen object could link both libraries

> and "programs", maybe we can discuss the details of this during office

> hours.

> 

> 1: https://pkg.go.dev/github.com/cilium/ebpf/cmd/bpf2go


This is really cool! Looks like libbpf C skeleton that embeds the full
elf file as a string into golang object, but it doesn't support
direct data/rodata/bss variable access yet? Looks like only progs and maps
are natively exposed into golang?

Have you seen the light skeleton yet?
https://patchwork.kernel.org/project/netdevbpf/patch/20210512213256.31203-18-alexei.starovoitov@gmail.com/
It's design centered on supporting languages like golang that
cannot take libbpf in C as-is. I think reimplementation of everything
in every other language that doesn't have clean binding to C is going
to hurt BPF ecosystem long term. The light skeleton is designed to address that.
It's libbpf-less. We're going to teach bpftool to emit golang equivalent of .lskel.h
Here is trace_printk.lskel.h example:
https://gist.github.com/4ast/774ea58f8286abac6aa8e3bf3bf3b903
The hex string dumps in there are not elf file anymore.
They're blobs directly interpreted by the kernel.
Note the headers:
#include <bpf/bpf.h>
#include <bpf/skel_internal.h>
The light skeleton is using only three sys_bpf wrappers (map_create, prog_load, test_run).
While the end result looks and feels like existing skeleton.
It doesn't need libelf and doesn't need all of libbpf to load and attach progs.
The plan is to take cilium bpf progs and represent them lskel.h tests in selftests/bpf
to demonstrate the feature richness.

The work on llvm's ld.lld linker is in progress as well:
https://reviews.llvm.org/D101336
The users will be able to:
clang -target bpf -flto -O2 -g -c t1.c -o t1.bc
clang -target bpf -flto -O2 -g -c t2.c -o t2.bc
ld.lld -r t1.bc t2.bc -o final.o

Such llvm linking step can only happen once, since it's an LTO compilation.
To use it with libraries the users would need to:
// compile lib files
clang -target bpf -flto -g -O2 -c abc1.bpf.c -o abc1.bpf.o
clang -target bpf -flto -g -O2 -c abc2.bpf.c -o abc2.bpf.o
// compile app files
clang -target bpf -flto -g -O2 -c app1.bpf.c -o app1.bpf.o
clang -target bpf -flto -g -O2 -c app2.bpf.c -o app2.bpf.o
// link and LTO-compile everything
ld.lld -r abc1.bpf.o abc2.bpf.o app1.bpf.o app2.bpf.o -o final.o

Obviously we still need libbpf static linker for linking true .o-s
when LTO is not suitable.
diff mbox series

Patch

diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
index 440a2fcb6441..06fee4a2910a 100644
--- a/tools/bpf/bpftool/gen.c
+++ b/tools/bpf/bpftool/gen.c
@@ -638,7 +638,7 @@  static int do_object(int argc, char **argv)
 	while (argc) {
 		file = GET_ARG();
 
-		err = bpf_linker__add_file(linker, file);
+		err = bpf_linker__add_file(linker, file, NULL);
 		if (err) {
 			p_err("failed to link '%s': %s (%d)", file, strerror(err), err);
 			goto out;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index bec4e6a6e31d..67505030c8d1 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -768,10 +768,20 @@  struct bpf_linker_opts {
 };
 #define bpf_linker_opts__last_field sz
 
+struct bpf_linker_file_opts {
+	/* size of this struct, for forward/backward compatiblity */
+	size_t sz;
+	/* object name override, similar to the one in bpf_object_open_opts */
+	const char *object_name;
+};
+#define bpf_linker_file_opts__last_field sz
+
 struct bpf_linker;
 
 LIBBPF_API struct bpf_linker *bpf_linker__new(const char *filename, struct bpf_linker_opts *opts);
-LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filename);
+LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker,
+				    const char *filename,
+				    const struct bpf_linker_file_opts *opts);
 LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
 LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
 
diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c
index 9de084b1c699..adc3aa9ce040 100644
--- a/tools/lib/bpf/linker.c
+++ b/tools/lib/bpf/linker.c
@@ -4,6 +4,8 @@ 
  *
  * Copyright (c) 2021 Facebook
  */
+#define _GNU_SOURCE
+#include <ctype.h>
 #include <stdbool.h>
 #include <stddef.h>
 #include <stdio.h>
@@ -47,6 +49,16 @@  struct src_sec {
 	int sec_type_id;
 };
 
+#define MAX_OBJ_NAME_LEN 64
+
+/* According to C standard, only first 63 characters of C identifiers are
+ * guaranteed to be significant. So for transformed static variables of the most
+ * verbose form ('<obj_name>..<func_name>.<var_name>') we need to reserve extra
+ * 64 (function name and dot) + 63 (variable name) + 2 (for .. separator)
+ * characters.
+ */
+#define MAX_VAR_NAME_LEN (MAX_OBJ_NAME_LEN + 2 + 63 + 1 + 63)
+
 struct src_obj {
 	const char *filename;
 	int fd;
@@ -67,6 +79,10 @@  struct src_obj {
 	int *sym_map;
 	/* mapping from the src BTF type IDs to dst ones */
 	int *btf_type_map;
+
+	/* BPF object name used for static variable prefixing */
+	char obj_name[MAX_OBJ_NAME_LEN];
+	size_t obj_name_len;
 };
 
 /* single .BTF.ext data section */
@@ -158,7 +174,9 @@  struct bpf_linker {
 
 static int init_output_elf(struct bpf_linker *linker, const char *file);
 
-static int linker_load_obj_file(struct bpf_linker *linker, const char *filename, struct src_obj *obj);
+static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
+				const struct bpf_linker_file_opts *opts,
+				struct src_obj *obj);
 static int linker_sanity_check_elf(struct src_obj *obj);
 static int linker_sanity_check_elf_symtab(struct src_obj *obj, struct src_sec *sec);
 static int linker_sanity_check_elf_relos(struct src_obj *obj, struct src_sec *sec);
@@ -435,15 +453,19 @@  static int init_output_elf(struct bpf_linker *linker, const char *file)
 	return 0;
 }
 
-int bpf_linker__add_file(struct bpf_linker *linker, const char *filename)
+int bpf_linker__add_file(struct bpf_linker *linker, const char *filename,
+			 const struct bpf_linker_file_opts *opts)
 {
 	struct src_obj obj = {};
 	int err = 0;
 
+	if (!OPTS_VALID(opts, bpf_linker_file_opts))
+		return -EINVAL;
+
 	if (!linker->elf)
 		return -EINVAL;
 
-	err = err ?: linker_load_obj_file(linker, filename, &obj);
+	err = err ?: linker_load_obj_file(linker, filename, opts, &obj);
 	err = err ?: linker_append_sec_data(linker, &obj);
 	err = err ?: linker_append_elf_syms(linker, &obj);
 	err = err ?: linker_append_elf_relos(linker, &obj);
@@ -529,7 +551,49 @@  static struct src_sec *add_src_sec(struct src_obj *obj, const char *sec_name)
 	return sec;
 }
 
-static int linker_load_obj_file(struct bpf_linker *linker, const char *filename, struct src_obj *obj)
+static void sanitize_obj_name(char *name)
+{
+	int i;
+
+	for (i = 0; name[i]; i++) {
+		if (name[i] == '_')
+			continue;
+		if (i == 0 && isalpha(name[i]))
+			continue;
+		if (i > 0 && isalnum(name[i]))
+			continue;
+
+		name[i] = '_';
+	}
+}
+
+static bool str_has_suffix(const char *str, const char *suffix)
+{
+	size_t n1 = strlen(str), n2 = strlen(suffix);
+
+	if (n1 < n2)
+		return false;
+
+	return strcmp(str + n1 - n2, suffix) == 0;
+}
+
+static void get_obj_name(char *name, const char *file)
+{
+	/* Using basename() GNU version which doesn't modify arg. */
+	strncpy(name, basename(file), MAX_OBJ_NAME_LEN - 1);
+	name[MAX_OBJ_NAME_LEN - 1] = '\0';
+
+	if (str_has_suffix(name, ".bpf.o"))
+		name[strlen(name) - sizeof(".bpf.o") + 1] = '\0';
+	else if (str_has_suffix(name, ".o"))
+		name[strlen(name) - sizeof(".o") + 1] = '\0';
+
+	sanitize_obj_name(name);
+}
+
+static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
+				const struct bpf_linker_file_opts *opts,
+				struct src_obj *obj)
 {
 #if __BYTE_ORDER == __LITTLE_ENDIAN
 	const int host_endianness = ELFDATA2LSB;
@@ -549,6 +613,14 @@  static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
 
 	obj->filename = filename;
 
+	if (OPTS_GET(opts, object_name, NULL)) {
+		strncpy(obj->obj_name, opts->object_name, MAX_OBJ_NAME_LEN);
+		obj->obj_name[MAX_OBJ_NAME_LEN - 1] = '\0';
+	} else {
+		get_obj_name(obj->obj_name, filename);
+	}
+	obj->obj_name_len = strlen(obj->obj_name);
+
 	obj->fd = open(filename, O_RDONLY);
 	if (obj->fd < 0) {
 		err = -errno;
@@ -2264,6 +2336,47 @@  static int linker_append_btf(struct bpf_linker *linker, struct src_obj *obj)
 				obj->btf_type_map[i] = glob_sym->btf_id;
 				continue;
 			}
+		} else if (btf_is_var(t) && btf_var(t)->linkage == BTF_VAR_STATIC) {
+			/* Static variables are renamed to include
+			 * "<obj_name>.." prefix (note double dots), similarly
+			 * to how static variables inside functions are named
+			 * "<func_name>.<var_name>" by compiler. This allows to
+			 * have  unique identifiers for static variables across
+			 * all linked object files (assuming unique filenames,
+			 * of course), which BPF skeleton relies on.
+			 *
+			 * So worst case static variable inside the function
+			 * will have the form "<obj_name>..<func_name>.<var_name"
+			 * and will get sanitized by BPF skeleton generation
+			 * logic to a field with <obj_name>__<func_name>_<var_name>
+			 * name. Typical static variable will have a
+			 * <obj_name>__<var_name> name, implying arguably nice
+			 * per-file scoping.
+			 *
+			 * If static var name already contains '..', though,
+			 * don't rename it, because it was already renamed by
+			 * previous linker passes.
+			 */
+			name = btf__str_by_offset(obj->btf, t->name_off);
+			if (!strstr(name, "..")) {
+				char new_name[MAX_VAR_NAME_LEN];
+
+				memcpy(new_name, obj->obj_name, obj->obj_name_len);
+				new_name[obj->obj_name_len] = '.';
+				new_name[obj->obj_name_len + 1] = '.';
+				new_name[obj->obj_name_len + 2] = '\0';
+				/* -3 is for '..' separator and terminating '\0' */
+				strncat(new_name, name, MAX_VAR_NAME_LEN - obj->obj_name_len - 3);
+
+				id = btf__add_str(obj->btf, new_name);
+				if (id < 0)
+					return id;
+
+				/* btf__add_str() might invalidate t, so re-fetch */
+				t = btf__type_by_id(obj->btf, i);
+
+				((struct btf_type *)t)->name_off = id;
+			}
 		}
 
 		id = btf__add_type(linker->btf, obj->btf, t);
diff --git a/tools/testing/selftests/bpf/prog_tests/skeleton.c b/tools/testing/selftests/bpf/prog_tests/skeleton.c
index fe87b77af459..bbade99fa544 100644
--- a/tools/testing/selftests/bpf/prog_tests/skeleton.c
+++ b/tools/testing/selftests/bpf/prog_tests/skeleton.c
@@ -82,10 +82,10 @@  void test_skeleton(void)
 	CHECK(data->out2 != 2, "res2", "got %lld != exp %d\n", data->out2, 2);
 	CHECK(bss->out3 != 3, "res3", "got %d != exp %d\n", (int)bss->out3, 3);
 	CHECK(bss->out4 != 4, "res4", "got %lld != exp %d\n", bss->out4, 4);
-	CHECK(bss->handler_out5.a != 5, "res5", "got %d != exp %d\n",
-	      bss->handler_out5.a, 5);
-	CHECK(bss->handler_out5.b != 6, "res6", "got %lld != exp %d\n",
-	      bss->handler_out5.b, 6);
+	CHECK(bss->test_skeleton__handler_out5.a != 5, "res5", "got %d != exp %d\n",
+	      bss->test_skeleton__handler_out5.a, 5);
+	CHECK(bss->test_skeleton__handler_out5.b != 6, "res6", "got %lld != exp %d\n",
+	      bss->test_skeleton__handler_out5.b, 6);
 	CHECK(bss->out6 != 14, "res7", "got %d != exp %d\n", bss->out6, 14);
 
 	CHECK(bss->bpf_syscall != kcfg->CONFIG_BPF_SYSCALL, "ext1",
diff --git a/tools/testing/selftests/bpf/prog_tests/static_linked.c b/tools/testing/selftests/bpf/prog_tests/static_linked.c
index 46556976dccc..f16736eab900 100644
--- a/tools/testing/selftests/bpf/prog_tests/static_linked.c
+++ b/tools/testing/selftests/bpf/prog_tests/static_linked.c
@@ -14,12 +14,12 @@  void test_static_linked(void)
 		return;
 
 	skel->rodata->rovar1 = 1;
-	skel->bss->static_var1 = 2;
-	skel->bss->static_var11 = 3;
+	skel->bss->test_static_linked1__static_var = 2;
+	skel->bss->test_static_linked1__static_var1 = 3;
 
 	skel->rodata->rovar2 = 4;
-	skel->bss->static_var2 = 5;
-	skel->bss->static_var22 = 6;
+	skel->bss->test_static_linked2__static_var = 5;
+	skel->bss->test_static_linked2__static_var2 = 6;
 
 	err = test_static_linked__load(skel);
 	if (!ASSERT_OK(err, "skel_load"))
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c b/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c
index ee49493dc125..43bf8ec8ae79 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_test_kern4.c
@@ -9,8 +9,8 @@  __u32 map1_id = 0, map2_id = 0;
 __u32 map1_accessed = 0, map2_accessed = 0;
 __u64 map1_seqnum = 0, map2_seqnum1 = 0, map2_seqnum2 = 0;
 
-static volatile const __u32 print_len;
-static volatile const __u32 ret1;
+volatile const __u32 print_len = 0;
+volatile const __u32 ret1 = 0;
 
 SEC("iter/bpf_map")
 int dump_bpf_map(struct bpf_iter__bpf_map *ctx)
diff --git a/tools/testing/selftests/bpf/progs/test_check_mtu.c b/tools/testing/selftests/bpf/progs/test_check_mtu.c
index c4a9bae96e75..71184af57749 100644
--- a/tools/testing/selftests/bpf/progs/test_check_mtu.c
+++ b/tools/testing/selftests/bpf/progs/test_check_mtu.c
@@ -11,8 +11,8 @@ 
 char _license[] SEC("license") = "GPL";
 
 /* Userspace will update with MTU it can see on device */
-static volatile const int GLOBAL_USER_MTU;
-static volatile const __u32 GLOBAL_USER_IFINDEX;
+volatile const int GLOBAL_USER_MTU;
+volatile const __u32 GLOBAL_USER_IFINDEX;
 
 /* BPF-prog will update these with MTU values it can see */
 __u32 global_bpf_mtu_xdp = 0;
diff --git a/tools/testing/selftests/bpf/progs/test_cls_redirect.c b/tools/testing/selftests/bpf/progs/test_cls_redirect.c
index 3c1e042962e6..e2a5acc4785c 100644
--- a/tools/testing/selftests/bpf/progs/test_cls_redirect.c
+++ b/tools/testing/selftests/bpf/progs/test_cls_redirect.c
@@ -39,8 +39,8 @@  char _license[] SEC("license") = "Dual BSD/GPL";
 /**
  * Destination port and IP used for UDP encapsulation.
  */
-static volatile const __be16 ENCAPSULATION_PORT;
-static volatile const __be32 ENCAPSULATION_IP;
+volatile const __be16 ENCAPSULATION_PORT;
+volatile const __be32 ENCAPSULATION_IP;
 
 typedef struct {
 	uint64_t processed_packets_total;
diff --git a/tools/testing/selftests/bpf/progs/test_snprintf_single.c b/tools/testing/selftests/bpf/progs/test_snprintf_single.c
index 402adaf344f9..6b63ba86b409 100644
--- a/tools/testing/selftests/bpf/progs/test_snprintf_single.c
+++ b/tools/testing/selftests/bpf/progs/test_snprintf_single.c
@@ -5,7 +5,7 @@ 
 #include <bpf/bpf_helpers.h>
 
 /* The format string is filled from the userspace such that loading fails */
-static const char fmt[10];
+const char fmt[10] = {};
 
 SEC("raw_tp/sys_enter")
 int handler(const void *ctx)
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
index a39eba9f5201..a1cc58b10c7c 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
@@ -28,8 +28,8 @@  struct {
 	__type(value, unsigned int);
 } verdict_map SEC(".maps");
 
-static volatile bool test_sockmap; /* toggled by user-space */
-static volatile bool test_ingress; /* toggled by user-space */
+bool test_sockmap = false; /* toggled by user-space */
+bool test_ingress = false; /* toggled by user-space */
 
 SEC("sk_skb/stream_parser")
 int prog_stream_parser(struct __sk_buff *skb)
diff --git a/tools/testing/selftests/bpf/progs/test_static_linked1.c b/tools/testing/selftests/bpf/progs/test_static_linked1.c
index ea1a6c4c7172..7e15d05888e7 100644
--- a/tools/testing/selftests/bpf/progs/test_static_linked1.c
+++ b/tools/testing/selftests/bpf/progs/test_static_linked1.c
@@ -5,8 +5,8 @@ 
 #include <bpf/bpf_helpers.h>
 
 /* 8-byte aligned .bss */
-static volatile long static_var1;
-static volatile int static_var11;
+static volatile long static_var;
+static volatile int static_var1;
 int var1 = 0;
 /* 4-byte aligned .rodata */
 const volatile int rovar1;
@@ -21,7 +21,7 @@  static __noinline int subprog(int x)
 SEC("raw_tp/sys_enter")
 int handler1(const void *ctx)
 {
-	var1 = subprog(rovar1) + static_var1 + static_var11;
+	var1 = subprog(rovar1) + static_var + static_var1;
 
 	return 0;
 }
diff --git a/tools/testing/selftests/bpf/progs/test_static_linked2.c b/tools/testing/selftests/bpf/progs/test_static_linked2.c
index 54d8d1ab577c..0d37c05d508e 100644
--- a/tools/testing/selftests/bpf/progs/test_static_linked2.c
+++ b/tools/testing/selftests/bpf/progs/test_static_linked2.c
@@ -5,8 +5,8 @@ 
 #include <bpf/bpf_helpers.h>
 
 /* 4-byte aligned .bss */
+static volatile int static_var;
 static volatile int static_var2;
-static volatile int static_var22;
 int var2 = 0;
 /* 8-byte aligned .rodata */
 const volatile long rovar2;
@@ -21,7 +21,7 @@  static __noinline int subprog(int x)
 SEC("raw_tp/sys_enter")
 int handler2(const void *ctx)
 {
-	var2 = subprog(rovar2) + static_var2 + static_var22;
+	var2 = subprog(rovar2) + static_var + static_var2;
 
 	return 0;
 }