Message ID | 20201218235614.2284956-4-andrii@kernel.org |
---|---|
State | New |
Headers | show |
Series | Add user-space and non-CO-RE variants of BPF_CORE_READ() | expand |
On Fri, Dec 18, 2020 at 03:56:14PM -0800, Andrii Nakryiko wrote: > + > +/* shuffled layout for relocatable (CO-RE) reads */ > +struct callback_head___shuffled { > + void (*func)(struct callback_head___shuffled *head); > + struct callback_head___shuffled *next; > +}; > + > +struct callback_head k_probe_in = {}; > +struct callback_head___shuffled k_core_in = {}; > + > +struct callback_head *u_probe_in = 0; > +struct callback_head___shuffled *u_core_in = 0; > + > +long k_probe_out = 0; > +long u_probe_out = 0; > + > +long k_core_out = 0; > +long u_core_out = 0; > + > +int my_pid = 0; > + > +SEC("raw_tracepoint/sys_enter") > +int handler(void *ctx) > +{ > + int pid = bpf_get_current_pid_tgid() >> 32; > + > + if (my_pid != pid) > + return 0; > + > + /* next pointers for kernel address space have to be initialized from > + * BPF side, user-space mmaped addresses are stil user-space addresses > + */ > + k_probe_in.next = &k_probe_in; > + __builtin_preserve_access_index(({k_core_in.next = &k_core_in;})); > + > + k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func); > + k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func); > + u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func); > + u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func); I don't understand what the test suppose to demonstrate. co-re relocs work for kernel btf only. Are you saying that 'struct callback_head' happened to be used by user space process that allocated it in user memory. And that is the same struct as being used by the kernel? So co-re relocs that apply against the kernel will sort-of work against the data of user space process because the user space is using the same struct? That sounds convoluted. I struggle to see the point of patch 1: +#define bpf_core_read_user(dst, sz, src) \ + bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src)) co-re for user structs? Aren't they uapi? No reloc is needed.
On Mon, Jan 4, 2021 at 7:46 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Dec 18, 2020 at 03:56:14PM -0800, Andrii Nakryiko wrote: > > + > > +/* shuffled layout for relocatable (CO-RE) reads */ > > +struct callback_head___shuffled { > > + void (*func)(struct callback_head___shuffled *head); > > + struct callback_head___shuffled *next; > > +}; > > + > > +struct callback_head k_probe_in = {}; > > +struct callback_head___shuffled k_core_in = {}; > > + > > +struct callback_head *u_probe_in = 0; > > +struct callback_head___shuffled *u_core_in = 0; > > + > > +long k_probe_out = 0; > > +long u_probe_out = 0; > > + > > +long k_core_out = 0; > > +long u_core_out = 0; > > + > > +int my_pid = 0; > > + > > +SEC("raw_tracepoint/sys_enter") > > +int handler(void *ctx) > > +{ > > + int pid = bpf_get_current_pid_tgid() >> 32; > > + > > + if (my_pid != pid) > > + return 0; > > + > > + /* next pointers for kernel address space have to be initialized from > > + * BPF side, user-space mmaped addresses are stil user-space addresses > > + */ > > + k_probe_in.next = &k_probe_in; > > + __builtin_preserve_access_index(({k_core_in.next = &k_core_in;})); > > + > > + k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func); > > + k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func); > > + u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func); > > + u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func); > > I don't understand what the test suppose to demonstrate. > co-re relocs work for kernel btf only. > Are you saying that 'struct callback_head' happened to be used by user space > process that allocated it in user memory. And that is the same struct as > being used by the kernel? So co-re relocs that apply against the kernel > will sort-of work against the data of user space process because > the user space is using the same struct? That sounds convoluted. The test itself just tests that bpf_probe_read_user() is executed, not bpf_probe_read_kernel(). But yes, the use case is to read kernel data structures from the user memory address space. See [0] for the last time this was requested and justifications. It's not the first time someone asked about the user-space variant of BPF_CORE_READ(), though I won't be able to find the reference at this time. [0] https://lore.kernel.org/bpf/CANaYP3GetBKUPDfo6PqWnm3xuGs2GZjLF8Ed51Q1=Emv2J-dKg@mail.gmail.com/ > I struggle to see the point of patch 1: > +#define bpf_core_read_user(dst, sz, src) \ > + bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src)) > > co-re for user structs? Aren't they uapi? No reloc is needed. The use case in [0] above is for reading UAPI structs, passed as input arguments to syscall. It's a pretty niche use case, but there are at least two more-or-less valid benefits to use CO-RE with "stable" UAPI structs: - handling 32-bit vs 64-bit UAPI structs uniformly; - handling UAPI fields that were added in later kernels, but are missing on the earlier ones. For the former, you'd need to compile two variants of the BPF program (or do convoluted and inconvenient 32-bit UAPI struct re-definition for 64-bit BPF target). For the latter... I guess you can do if/else dance based on the kernel version. Which sucks and is inconvenient (and kernel version checks are discouraged, it's more reliable to detect availability of specific types and fields). So all in all, while pretty rare and niche, seemed like a valid use case. And easy to support while reusing all the macro logic almost without any changes.
On Mon, Jan 04, 2021 at 09:08:21PM -0800, Andrii Nakryiko wrote: > On Mon, Jan 4, 2021 at 7:46 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Fri, Dec 18, 2020 at 03:56:14PM -0800, Andrii Nakryiko wrote: > > > + > > > +/* shuffled layout for relocatable (CO-RE) reads */ > > > +struct callback_head___shuffled { > > > + void (*func)(struct callback_head___shuffled *head); > > > + struct callback_head___shuffled *next; > > > +}; > > > + > > > +struct callback_head k_probe_in = {}; > > > +struct callback_head___shuffled k_core_in = {}; > > > + > > > +struct callback_head *u_probe_in = 0; > > > +struct callback_head___shuffled *u_core_in = 0; > > > + > > > +long k_probe_out = 0; > > > +long u_probe_out = 0; > > > + > > > +long k_core_out = 0; > > > +long u_core_out = 0; > > > + > > > +int my_pid = 0; > > > + > > > +SEC("raw_tracepoint/sys_enter") > > > +int handler(void *ctx) > > > +{ > > > + int pid = bpf_get_current_pid_tgid() >> 32; > > > + > > > + if (my_pid != pid) > > > + return 0; > > > + > > > + /* next pointers for kernel address space have to be initialized from > > > + * BPF side, user-space mmaped addresses are stil user-space addresses > > > + */ > > > + k_probe_in.next = &k_probe_in; > > > + __builtin_preserve_access_index(({k_core_in.next = &k_core_in;})); > > > + > > > + k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func); > > > + k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func); > > > + u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func); > > > + u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func); > > > > I don't understand what the test suppose to demonstrate. > > co-re relocs work for kernel btf only. > > Are you saying that 'struct callback_head' happened to be used by user space > > process that allocated it in user memory. And that is the same struct as > > being used by the kernel? So co-re relocs that apply against the kernel > > will sort-of work against the data of user space process because > > the user space is using the same struct? That sounds convoluted. > > The test itself just tests that bpf_probe_read_user() is executed, not > bpf_probe_read_kernel(). But yes, the use case is to read kernel data > structures from the user memory address space. See [0] for the last > time this was requested and justifications. It's not the first time > someone asked about the user-space variant of BPF_CORE_READ(), though > I won't be able to find the reference at this time. > > [0] https://lore.kernel.org/bpf/CANaYP3GetBKUPDfo6PqWnm3xuGs2GZjLF8Ed51Q1=Emv2J-dKg@mail.gmail.com/ That's quite confusing thread. > > I struggle to see the point of patch 1: > > +#define bpf_core_read_user(dst, sz, src) \ > > + bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src)) > > > > co-re for user structs? Aren't they uapi? No reloc is needed. > > The use case in [0] above is for reading UAPI structs, passed as input > arguments to syscall. It's a pretty niche use case, but there are at > least two more-or-less valid benefits to use CO-RE with "stable" UAPI > structs: > > - handling 32-bit vs 64-bit UAPI structs uniformly; what do you mean? 32-bit user space running on 64-bit kernel works through 'compat' syscalls. If bpf progs are accessing 64-bit uapi structs in such case they're broken and no amount of co-re can help. > - handling UAPI fields that were added in later kernels, but are > missing on the earlier ones. > > For the former, you'd need to compile two variants of the BPF program > (or do convoluted and inconvenient 32-bit UAPI struct re-definition > for 64-bit BPF target). No. 32-bit uapi structs should be used by bpf prog. compat stuff is not only casting pointers from 64-bit to 32. > For the latter... I guess you can do if/else > dance based on the kernel version. Which sucks and is inconvenient > (and kernel version checks are discouraged, it's more reliable to > detect availability of specific types and fields). Not really. ifdef based on kernel version is not needed. bpf_core_field_exists() will work just fine. No need to bpf_probe_read_user() macros. > So all in all, while pretty rare and niche, seemed like a valid use > case. And easy to support while reusing all the macro logic almost > without any changes. I think these new macros added with confusing and unclear goals will do more harm than good.
On Tue, Jan 5, 2021 at 11:04 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Mon, Jan 04, 2021 at 09:08:21PM -0800, Andrii Nakryiko wrote: > > On Mon, Jan 4, 2021 at 7:46 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Fri, Dec 18, 2020 at 03:56:14PM -0800, Andrii Nakryiko wrote: > > > > + > > > > +/* shuffled layout for relocatable (CO-RE) reads */ > > > > +struct callback_head___shuffled { > > > > + void (*func)(struct callback_head___shuffled *head); > > > > + struct callback_head___shuffled *next; > > > > +}; > > > > + > > > > +struct callback_head k_probe_in = {}; > > > > +struct callback_head___shuffled k_core_in = {}; > > > > + > > > > +struct callback_head *u_probe_in = 0; > > > > +struct callback_head___shuffled *u_core_in = 0; > > > > + > > > > +long k_probe_out = 0; > > > > +long u_probe_out = 0; > > > > + > > > > +long k_core_out = 0; > > > > +long u_core_out = 0; > > > > + > > > > +int my_pid = 0; > > > > + > > > > +SEC("raw_tracepoint/sys_enter") > > > > +int handler(void *ctx) > > > > +{ > > > > + int pid = bpf_get_current_pid_tgid() >> 32; > > > > + > > > > + if (my_pid != pid) > > > > + return 0; > > > > + > > > > + /* next pointers for kernel address space have to be initialized from > > > > + * BPF side, user-space mmaped addresses are stil user-space addresses > > > > + */ > > > > + k_probe_in.next = &k_probe_in; > > > > + __builtin_preserve_access_index(({k_core_in.next = &k_core_in;})); > > > > + > > > > + k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func); > > > > + k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func); > > > > + u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func); > > > > + u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func); > > > > > > I don't understand what the test suppose to demonstrate. > > > co-re relocs work for kernel btf only. > > > Are you saying that 'struct callback_head' happened to be used by user space > > > process that allocated it in user memory. And that is the same struct as > > > being used by the kernel? So co-re relocs that apply against the kernel > > > will sort-of work against the data of user space process because > > > the user space is using the same struct? That sounds convoluted. > > > > The test itself just tests that bpf_probe_read_user() is executed, not > > bpf_probe_read_kernel(). But yes, the use case is to read kernel data > > structures from the user memory address space. See [0] for the last > > time this was requested and justifications. It's not the first time > > someone asked about the user-space variant of BPF_CORE_READ(), though > > I won't be able to find the reference at this time. > > > > [0] https://lore.kernel.org/bpf/CANaYP3GetBKUPDfo6PqWnm3xuGs2GZjLF8Ed51Q1=Emv2J-dKg@mail.gmail.com/ > > That's quite confusing thread. > > > > I struggle to see the point of patch 1: > > > +#define bpf_core_read_user(dst, sz, src) \ > > > + bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src)) > > > > > > co-re for user structs? Aren't they uapi? No reloc is needed. > > > > The use case in [0] above is for reading UAPI structs, passed as input > > arguments to syscall. It's a pretty niche use case, but there are at > > least two more-or-less valid benefits to use CO-RE with "stable" UAPI > > structs: > > > > - handling 32-bit vs 64-bit UAPI structs uniformly; > > what do you mean? > 32-bit user space running on 64-bit kernel works through 'compat' syscalls. > If bpf progs are accessing 64-bit uapi structs in such case they're broken > and no amount of co-re can help. I know nothing about compat, so can't comment on that. But the way I understood the situation was the BPF program compiled once (well, at least from the unmodified source code), but runs on ARM32 and (on a separate physical host) on ARM64. And it's task is, say, to read UAPI kernel structures from syscall arguments. > > > - handling UAPI fields that were added in later kernels, but are > > missing on the earlier ones. > > > > For the former, you'd need to compile two variants of the BPF program > > (or do convoluted and inconvenient 32-bit UAPI struct re-definition > > for 64-bit BPF target). > > No. 32-bit uapi structs should be used by bpf prog. > compat stuff is not only casting pointers from 64-bit to 32. > See above about compat, that's not what I was thinking about. One simple example I found in UAPI definitions is struct timespec, it seems it's defined with `long`: struct timespec { __kernel_old_time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ } So if you were to trace clock_gettime(), you'd need to deal with differently-sized reads of tv_nsec, depending on whether you are running on the 32-bit or 64-bit host. There are probably other examples where UAPI structs use long instead of __u32 or __u64, but I didn't dig too deep. > > For the latter... I guess you can do if/else > > dance based on the kernel version. Which sucks and is inconvenient > > (and kernel version checks are discouraged, it's more reliable to > > detect availability of specific types and fields). > > Not really. ifdef based on kernel version is not needed. > bpf_core_field_exists() will work just fine. > No need to bpf_probe_read_user() macros. Yes, you are right, detection of field/type existence doesn't depend on kernel- vs user-space, disregard this one. > > > So all in all, while pretty rare and niche, seemed like a valid use > > case. And easy to support while reusing all the macro logic almost > > without any changes. > > I think these new macros added with confusing and unclear goals > will do more harm than good. Let's see if Gilad can provide his perspective. I have no strong feelings about this and can send a patch removing CORE_READ_USER variants (they didn't make it into libbpf v0.3, so no harm or API stability concerns). BPF_PROBE_READ() and BPF_PROBE_READ_USER() are still useful for reading non-relocatable, but nested data structures, so I'd prefer to keep those.
On Tue, Jan 05, 2021 at 01:03:47PM -0800, Andrii Nakryiko wrote: > > > > > > - handling 32-bit vs 64-bit UAPI structs uniformly; > > > > what do you mean? > > 32-bit user space running on 64-bit kernel works through 'compat' syscalls. > > If bpf progs are accessing 64-bit uapi structs in such case they're broken > > and no amount of co-re can help. > > I know nothing about compat, so can't comment on that. But the way I > understood the situation was the BPF program compiled once (well, at > least from the unmodified source code), but runs on ARM32 and (on a > separate physical host) on ARM64. And it's task is, say, to read UAPI > kernel structures from syscall arguments. I'm not sure about arm, but on x86 there should be two different progs for this to work (if we're talking about 32-bit userspace). See below. > One simple example I found in UAPI definitions is struct timespec, it > seems it's defined with `long`: > > struct timespec { > __kernel_old_time_t tv_sec; /* seconds */ > long tv_nsec; /* nanoseconds */ > } > > So if you were to trace clock_gettime(), you'd need to deal with > differently-sized reads of tv_nsec, depending on whether you are > running on the 32-bit or 64-bit host. I believe gettime is vdso, so that's not a syscall and there are no bpf hooks available to track that. For normal syscall with timespec argument like sys_recvmmsg and sys_nanosleep the user needs to load two programs and attach to two different points: fentry/__x64_sys_nanosleep and fentry/__ia32_sys_nanosleep. (I'm not an expert on compat and not quite sure how _time32 suffix is handled, so may be 4 different progs are needed). The point is that there are several different entry points with different args. ia32 user space will call into the kernel differently. At the end different pieces of the syscall and compat handling will call into hrtimer_nanosleep with ktime. So if one bpf prog needs to work for 32 and 64 bit user space it should be attached to common piece of code like hrtimer_nanosleep(). If it's attached to syscall entry it needs to attach to at least 2 different entry points with 2 different progs. I guess it's possible to load single prog and kprobe-attach it to two different kernel functions, but code is kinda different. For the simplest things like sys_nanosleep it might work and timespec is simple enough structure to handle with sizeof(long) co-re tricks, but it's not a generally applicable approach. So for a tool to track 32-bit and 64-bit user space is quite tricky. If bpf prog doesn't care about user space and only needs to deal with 64-bit kernel + 64-bit user space and 32-bit kernel + 32-bit user space then it's a bit simpler, but probably still requires attaching to two different points. On 32-bit x86 kernel there will be no "fentry/__x64_sys_nanosleep" function. > There are probably other examples where UAPI structs use long instead > of __u32 or __u64, but I didn't dig too deep. There is also crazy difference between compact_ioctl vs ioctl. The point I'm trying to get across is that the clean 32-bit processing is a lot more involved than just CORE_READ_USER macro and I want to make sure that the expections are set correctly. BPF CO-RE prog may handle 32-bit and 64-bit at the same time, but it's probably exception than the rule. > Let's see if Gilad can provide his perspective. I have no strong > feelings about this and can send a patch removing CORE_READ_USER > variants (they didn't make it into libbpf v0.3, so no harm or API > stability concerns). BPF_PROBE_READ() and BPF_PROBE_READ_USER() are > still useful for reading non-relocatable, but nested data structures, > so I'd prefer to keep those. Let's hear from Gilad. I'm not against BPF_CORE_READ_USER macro as-is. I was objecting to the reasons of implementing it.
On Wed, Jan 6, 2021 at 8:09 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Tue, Jan 05, 2021 at 01:03:47PM -0800, Andrii Nakryiko wrote: > > > > > > > > - handling 32-bit vs 64-bit UAPI structs uniformly; > > > > > > what do you mean? > > > 32-bit user space running on 64-bit kernel works through 'compat' syscalls. > > > If bpf progs are accessing 64-bit uapi structs in such case they're broken > > > and no amount of co-re can help. > > > > I know nothing about compat, so can't comment on that. But the way I > > understood the situation was the BPF program compiled once (well, at > > least from the unmodified source code), but runs on ARM32 and (on a > > separate physical host) on ARM64. And it's task is, say, to read UAPI > > kernel structures from syscall arguments. > > I'm not sure about arm, but on x86 there should be two different progs > for this to work (if we're talking about 32-bit userspace). > See below. > > > One simple example I found in UAPI definitions is struct timespec, it > > seems it's defined with `long`: > > > > struct timespec { > > __kernel_old_time_t tv_sec; /* seconds */ > > long tv_nsec; /* nanoseconds */ > > } > > > > So if you were to trace clock_gettime(), you'd need to deal with > > differently-sized reads of tv_nsec, depending on whether you are > > running on the 32-bit or 64-bit host. > > I believe gettime is vdso, so that's not a syscall and there are no bpf hooks > available to track that. > For normal syscall with timespec argument like sys_recvmmsg and sys_nanosleep > the user needs to load two programs and attach to two different points: > fentry/__x64_sys_nanosleep and fentry/__ia32_sys_nanosleep. > (I'm not an expert on compat and not quite sure how _time32 suffix is handled, > so may be 4 different progs are needed). > The point is that there are several different entry points with different args. > ia32 user space will call into the kernel differently. > At the end different pieces of the syscall and compat handling will call > into hrtimer_nanosleep with ktime. > So if one bpf prog needs to work for 32 and 64 bit user space it should be > attached to common piece of code like hrtimer_nanosleep(). > If it's attached to syscall entry it needs to attach to at least 2 different > entry points with 2 different progs. I think that while it is true that syscall handlers are different for 32 and 64 bit, it may happen that some syscall handlers will pass user-space pointers down to shared common functions, which may be hooked by ebpf probes. I am no expert, but I think that the cp_new_stat function is such an example; it gets a user struct stat* pointer, whose definition varies across different architectures (32/64-bit, different processor archs etc.) > I guess it's possible to load single prog and kprobe-attach it to > two different kernel functions, but code is kinda different. > For the simplest things like sys_nanosleep it might work and timespec > is simple enough structure to handle with sizeof(long) co-re tricks, > but it's not a generally applicable approach. > So for a tool to track 32-bit and 64-bit user space is quite tricky. > > If bpf prog doesn't care about user space and only needs to deal > with 64-bit kernel + 64-bit user space and 32-bit kernel + 32-bit user space > then it's a bit simpler, but probably still requires attaching > to two different points. On 32-bit x86 kernel there will be no > "fentry/__x64_sys_nanosleep" function. > > > There are probably other examples where UAPI structs use long instead > > of __u32 or __u64, but I didn't dig too deep. > > There is also crazy difference between compact_ioctl vs ioctl. > > The point I'm trying to get across is that the clean 32-bit processing is > a lot more involved than just CORE_READ_USER macro and I want to make sure that > the expections are set correctly. > BPF CO-RE prog may handle 32-bit and 64-bit at the same time, but > it's probably exception than the rule. > > > Let's see if Gilad can provide his perspective. I have no strong > > feelings about this and can send a patch removing CORE_READ_USER > > variants (they didn't make it into libbpf v0.3, so no harm or API > > stability concerns). BPF_PROBE_READ() and BPF_PROBE_READ_USER() are > > still useful for reading non-relocatable, but nested data structures, > > so I'd prefer to keep those. > > Let's hear from Gilad. > I'm not against BPF_CORE_READ_USER macro as-is. I was objecting > to the reasons of implementing it. If I am not mistaken (which is completely possible), I think that providing such a macro will not cause any more confusion than the bpf_probe_read_{,user} distinction already does, since BPF_CORE_READ_USER to BPF_CORE_READ is the same as bpf_probe_read_user to bpf_probe_read.
On Wed, Jan 6, 2021 at 2:10 AM Gilad Reti <gilad.reti@gmail.com> wrote: > > On Wed, Jan 6, 2021 at 8:09 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Tue, Jan 05, 2021 at 01:03:47PM -0800, Andrii Nakryiko wrote: > > > > > > > > > > - handling 32-bit vs 64-bit UAPI structs uniformly; > > > > > > > > what do you mean? > > > > 32-bit user space running on 64-bit kernel works through 'compat' syscalls. > > > > If bpf progs are accessing 64-bit uapi structs in such case they're broken > > > > and no amount of co-re can help. > > > > > > I know nothing about compat, so can't comment on that. But the way I > > > understood the situation was the BPF program compiled once (well, at > > > least from the unmodified source code), but runs on ARM32 and (on a > > > separate physical host) on ARM64. And it's task is, say, to read UAPI > > > kernel structures from syscall arguments. > > > > I'm not sure about arm, but on x86 there should be two different progs > > for this to work (if we're talking about 32-bit userspace). > > See below. > > > > > One simple example I found in UAPI definitions is struct timespec, it > > > seems it's defined with `long`: > > > > > > struct timespec { > > > __kernel_old_time_t tv_sec; /* seconds */ > > > long tv_nsec; /* nanoseconds */ > > > } > > > > > > So if you were to trace clock_gettime(), you'd need to deal with > > > differently-sized reads of tv_nsec, depending on whether you are > > > running on the 32-bit or 64-bit host. > > > > I believe gettime is vdso, so that's not a syscall and there are no bpf hooks > > available to track that. > > For normal syscall with timespec argument like sys_recvmmsg and sys_nanosleep > > the user needs to load two programs and attach to two different points: > > fentry/__x64_sys_nanosleep and fentry/__ia32_sys_nanosleep. > > (I'm not an expert on compat and not quite sure how _time32 suffix is handled, > > so may be 4 different progs are needed). > > The point is that there are several different entry points with different args. > > ia32 user space will call into the kernel differently. > > At the end different pieces of the syscall and compat handling will call > > into hrtimer_nanosleep with ktime. > > So if one bpf prog needs to work for 32 and 64 bit user space it should be > > attached to common piece of code like hrtimer_nanosleep(). > > If it's attached to syscall entry it needs to attach to at least 2 different > > entry points with 2 different progs. > > I think that while it is true that syscall handlers are different for > 32 and 64 bit, > it may happen that some syscall handlers will pass user-space pointers down to > shared common functions, which may be hooked by ebpf probes. I am no expert, but > I think that the cp_new_stat function is such an example; it gets a > user struct stat* pointer, > whose definition varies across different architectures (32/64-bit, > different processor archs etc.) > > > I guess it's possible to load single prog and kprobe-attach it to > > two different kernel functions, but code is kinda different. > > For the simplest things like sys_nanosleep it might work and timespec > > is simple enough structure to handle with sizeof(long) co-re tricks, > > but it's not a generally applicable approach. > > So for a tool to track 32-bit and 64-bit user space is quite tricky. > > > > If bpf prog doesn't care about user space and only needs to deal > > with 64-bit kernel + 64-bit user space and 32-bit kernel + 32-bit user space > > then it's a bit simpler, but probably still requires attaching > > to two different points. On 32-bit x86 kernel there will be no > > "fentry/__x64_sys_nanosleep" function. > > > > > There are probably other examples where UAPI structs use long instead > > > of __u32 or __u64, but I didn't dig too deep. > > > > There is also crazy difference between compact_ioctl vs ioctl. > > > > The point I'm trying to get across is that the clean 32-bit processing is > > a lot more involved than just CORE_READ_USER macro and I want to make sure that > > the expections are set correctly. > > BPF CO-RE prog may handle 32-bit and 64-bit at the same time, but > > it's probably exception than the rule. > > > > > Let's see if Gilad can provide his perspective. I have no strong > > > feelings about this and can send a patch removing CORE_READ_USER > > > variants (they didn't make it into libbpf v0.3, so no harm or API > > > stability concerns). BPF_PROBE_READ() and BPF_PROBE_READ_USER() are > > > still useful for reading non-relocatable, but nested data structures, > > > so I'd prefer to keep those. > > > > Let's hear from Gilad. > > I'm not against BPF_CORE_READ_USER macro as-is. I was objecting > > to the reasons of implementing it. > > If I am not mistaken (which is completely possible), I think that > providing such a macro will > not cause any more confusion than the bpf_probe_read_{,user} > distinction already does, > since BPF_CORE_READ_USER to BPF_CORE_READ is the same as bpf_probe_read_user > to bpf_probe_read. I think the biggest source of confusion is that USER part in BPF_CORE_READ_USER refers to reading data from user address space, not really user structs (which is kind of natural instinct here). CO-RE *always* works only with kernel types, which is obvious if you have a lot of experience with using CO-RE, but not initially, unfortunately. So in the end, while a narrow use case, it seems like it might be useful to have BPF_CORE_READ_USER. Maybe emphasizing (in the doc comment) the fact that all types need to be kernel types would help a bit? If not, it's not a big deal, because it's pretty easy for users to just explicitly write bpf_probe_read_user(&dst, sizeof(dst), __builtin_preserve_access_index(src)). But there is definitely a bit of mental satisfaction with having all possible combinations of CORE/non-CORE and USER/KERNEL variants :) Btw, it seems these patches are already in bpf-next, so, Alexei, please let me know if you insist on removing BPF_CORE_READ_USER() variant and I'll send a follow up patch.
On Wed, Jan 06, 2021 at 03:25:30PM -0800, Andrii Nakryiko wrote: > > > > If I am not mistaken (which is completely possible), I think that > > providing such a macro will > > not cause any more confusion than the bpf_probe_read_{,user} > > distinction already does, > > since BPF_CORE_READ_USER to BPF_CORE_READ is the same as bpf_probe_read_user > > to bpf_probe_read. > > I think the biggest source of confusion is that USER part in > BPF_CORE_READ_USER refers to reading data from user address space, not > really user structs (which is kind of natural instinct here). CO-RE > *always* works only with kernel types, which is obvious if you have a > lot of experience with using CO-RE, but not initially, unfortunately. Please send a patch to add such clarifying comment.
diff --git a/tools/testing/selftests/bpf/prog_tests/core_read_macros.c b/tools/testing/selftests/bpf/prog_tests/core_read_macros.c new file mode 100644 index 000000000000..96f5cf3c6fa2 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/core_read_macros.c @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2020 Facebook */ + +#include <test_progs.h> + +struct callback_head { + struct callback_head *next; + void (*func)(struct callback_head *head); +}; + +/* ___shuffled flavor is just an illusion for BPF code, it doesn't really + * exist and user-space needs to provide data in the memory layout that + * matches callback_head. We just defined ___shuffled flavor to make it easier + * to work with the skeleton + */ +struct callback_head___shuffled { + struct callback_head___shuffled *next; + void (*func)(struct callback_head *head); +}; + +#include "test_core_read_macros.skel.h" + +void test_core_read_macros(void) +{ + int duration = 0, err; + struct test_core_read_macros* skel; + struct test_core_read_macros__bss *bss; + struct callback_head u_probe_in; + struct callback_head___shuffled u_core_in; + + skel = test_core_read_macros__open_and_load(); + if (CHECK(!skel, "skel_open", "failed to open skeleton\n")) + return; + bss = skel->bss; + bss->my_pid = getpid(); + + /* next pointers have to be set from the kernel side */ + bss->k_probe_in.func = (void *)(long)0x1234; + bss->k_core_in.func = (void *)(long)0xabcd; + + u_probe_in.next = &u_probe_in; + u_probe_in.func = (void *)(long)0x5678; + bss->u_probe_in = &u_probe_in; + + u_core_in.next = &u_core_in; + u_core_in.func = (void *)(long)0xdbca; + bss->u_core_in = &u_core_in; + + err = test_core_read_macros__attach(skel); + if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err)) + goto cleanup; + + /* trigger tracepoint */ + usleep(1); + + ASSERT_EQ(bss->k_probe_out, 0x1234, "k_probe_out"); + ASSERT_EQ(bss->k_core_out, 0xabcd, "k_core_out"); + + ASSERT_EQ(bss->u_probe_out, 0x5678, "u_probe_out"); + ASSERT_EQ(bss->u_core_out, 0xdbca, "u_core_out"); + +cleanup: + test_core_read_macros__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_core_read_macros.c b/tools/testing/selftests/bpf/progs/test_core_read_macros.c new file mode 100644 index 000000000000..50054af0a928 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_core_read_macros.c @@ -0,0 +1,51 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2020 Facebook + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_core_read.h> + +char _license[] SEC("license") = "GPL"; + +/* shuffled layout for relocatable (CO-RE) reads */ +struct callback_head___shuffled { + void (*func)(struct callback_head___shuffled *head); + struct callback_head___shuffled *next; +}; + +struct callback_head k_probe_in = {}; +struct callback_head___shuffled k_core_in = {}; + +struct callback_head *u_probe_in = 0; +struct callback_head___shuffled *u_core_in = 0; + +long k_probe_out = 0; +long u_probe_out = 0; + +long k_core_out = 0; +long u_core_out = 0; + +int my_pid = 0; + +SEC("raw_tracepoint/sys_enter") +int handler(void *ctx) +{ + int pid = bpf_get_current_pid_tgid() >> 32; + + if (my_pid != pid) + return 0; + + /* next pointers for kernel address space have to be initialized from + * BPF side, user-space mmaped addresses are stil user-space addresses + */ + k_probe_in.next = &k_probe_in; + __builtin_preserve_access_index(({k_core_in.next = &k_core_in;})); + + k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func); + k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func); + u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func); + u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func); + + return 0; +} +
Add selftests validating that newly added variations of BPF_CORE_READ(), for use with user-space addresses and for non-CO-RE reads, work as expected. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> --- .../bpf/prog_tests/core_read_macros.c | 64 +++++++++++++++++++ .../bpf/progs/test_core_read_macros.c | 51 +++++++++++++++ 2 files changed, 115 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/core_read_macros.c create mode 100644 tools/testing/selftests/bpf/progs/test_core_read_macros.c