mbox series

[0/7] bpf: Add fd modes check for map iter and extend libbpf

Message ID 20220906170301.256206-1-roberto.sassu@huaweicloud.com
Headers show
Series bpf: Add fd modes check for map iter and extend libbpf | expand

Message

Roberto Sassu Sept. 6, 2022, 5:02 p.m. UTC
From: Roberto Sassu <roberto.sassu@huawei.com>

Add a missing fd modes check in map iterators, potentially causing
unauthorized map writes by eBPF programs attached to the iterator. Use this
patch set as an opportunity to start a discussion with the cgroup
developers about whether a security check is missing or not for their
iterator.

Also, extend libbpf with the _opts variant of bpf_*_get_fd_by_id(). Only
bpf_map_get_fd_by_id_opts() is really useful in this patch set, to ensure
that the creation of a map iterator fails with a read-only fd.

Add all variants in this patch set for symmetry with
bpf_map_get_fd_by_id_opts(), and because all the variants share the same
opts structure. Also, add all the variants here, to shrink the patch set
fixing map permissions requested by bpftool, so that the remaining patches
are only about the latter.

Finally, extend the bpf_iter test with the read-only fd check, and test
each _opts variant of bpf_*_get_fd_by_id().

Roberto Sassu (7):
  bpf: Add missing fd modes check for map iterators
  libbpf: Define bpf_get_fd_opts and introduce
    bpf_map_get_fd_by_id_opts()
  libbpf: Introduce bpf_prog_get_fd_by_id_opts()
  libbpf: Introduce bpf_btf_get_fd_by_id_opts()
  libbpf: Introduce bpf_link_get_fd_by_id_opts()
  selftests/bpf: Ensure fd modes are checked for map iters and destroy
    links
  selftests/bpf: Add tests for _opts variants of libbpf

 include/linux/bpf.h                           |   2 +-
 kernel/bpf/inode.c                            |   2 +-
 kernel/bpf/map_iter.c                         |   3 +-
 kernel/bpf/syscall.c                          |   8 +-
 net/core/bpf_sk_storage.c                     |   3 +-
 net/core/sock_map.c                           |   3 +-
 tools/lib/bpf/bpf.c                           |  47 +++++-
 tools/lib/bpf/bpf.h                           |  16 ++
 tools/lib/bpf/libbpf.map                      |  10 +-
 tools/lib/bpf/libbpf_version.h                |   2 +-
 .../selftests/bpf/prog_tests/bpf_iter.c       |  34 +++-
 .../bpf/prog_tests/libbpf_get_fd_opts.c       | 145 ++++++++++++++++++
 .../bpf/progs/test_libbpf_get_fd_opts.c       |  49 ++++++
 13 files changed, 309 insertions(+), 15 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/libbpf_get_fd_opts.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_libbpf_get_fd_opts.c

Comments

Roberto Sassu Sept. 7, 2022, 8:02 a.m. UTC | #1
On Tue, 2022-09-06 at 11:21 -0700, Alexei Starovoitov wrote:
> On Tue, Sep 6, 2022 at 10:04 AM Roberto Sassu
> <roberto.sassu@huaweicloud.com> wrote:
> > From: Roberto Sassu <roberto.sassu@huawei.com>
> > 
> > Commit 6e71b04a82248 ("bpf: Add file mode configuration into bpf
> > maps")
> > added the BPF_F_RDONLY and BPF_F_WRONLY flags, to let user space
> > specify
> > whether it will just read or modify a map.
> > 
> > Map access control is done in two steps. First, when user space
> > wants to
> > obtain a map fd, it provides to the kernel the eBPF-defined flags,
> > which
> > are converted into open flags and passed to the security_bpf_map()
> > security
> > hook for evaluation by LSMs.
> > 
> > Second, if user space successfully obtained an fd, it passes that
> > fd to the
> > kernel when it requests a map operation (e.g. lookup or update).
> > The kernel
> > first checks if the fd has the modes required to perform the
> > requested
> > operation and, if yes, continues the execution and returns the
> > result to
> > user space.
> > 
> > While the fd modes check was added for map_*_elem() functions, it
> > is
> > currently missing for map iterators, added more recently with
> > commit
> > a5cbe05a6673 ("bpf: Implement bpf iterator for map elements"). A
> > map
> > iterator executes a chosen eBPF program for each key/value pair of
> > a map
> > and allows that program to read and/or modify them.
> > 
> > Whether a map iterator allows only read or also write depends on
> > whether
> > the MEM_RDONLY flag in the ctx_arg_info member of the bpf_iter_reg
> > structure is set. Also, write needs to be supported at verifier
> > level (for
> > example, it is currently not supported for sock maps).
> > 
> > Since map iterators obtain a map from a user space fd with
> > bpf_map_get_with_uref(), add the new req_modes parameter to that
> > function,
> > so that map iterators can provide the required fd modes to access a
> > map. If
> > the user space fd doesn't include the required modes,
> > bpf_map_get_with_uref() returns with an error, and the map iterator
> > will
> > not be created.
> > 
> > If a map iterator marks both the key and value as read-only, it
> > calls
> > bpf_map_get_with_uref() with FMODE_CAN_READ as value for req_modes.
> > If it
> > also allows write access to either the key or the value, it calls
> > that
> > function with FMODE_CAN_READ | FMODE_CAN_WRITE as value for
> > req_modes,
> > regardless of whether or not the write is supported by the verifier
> > (the
> > write is intentionally allowed).
> > 
> > bpf_fd_probe_obj() does not require any fd mode, as the fd is only
> > used for
> > the purpose of finding the eBPF object type, for pinning the object
> > to the
> > bpffs filesystem.
> > 
> > Finally, it is worth to mention that the fd modes check was not
> > added for
> > the cgroup iterator, although it registers an attach_target method
> > like the
> > other iterators. The reason is that the fd is not the only way for
> > user
> > space to reference a cgroup object (also by ID and by path). For
> > the
> > protection to be effective, all reference methods need to be
> > evaluated
> > consistently. This work is deferred to a separate patch.
> 
> I think the current behavior is fine.
> File permissions don't apply at iterator level or prog level.

+ Chenbo, linux-security-module

Well, if you write a security module to prevent writes on a map, and
user space is able to do it anyway with an iterator, what is the
purpose of the security module then?

> fmode_can_read/write are for syscall commands only.
> To be fair we've added them to lookup/delete commands
> and it was more of a pain to maintain and no confirmed good use.

I think a good use would be requesting the right permission for the
type of operation that needs to be performed, e.g. read-only permission
when you have a read-like operation like a lookup or dump.

By always requesting read-write permission, for all operations,
security modules won't be able to distinguish which operation has to be
denied to satisfy the policy.

One example of that is that, when there is a security module preventing
writes on maps (will be that uncommon?), bpftool is not able to show
the full list of maps because it asks for read-write permission for
getting the map info.

Freezing the map is not a solution, if you want to allow certain
subjects to continuously update the protected map at run-time.

Roberto
Alexei Starovoitov Sept. 7, 2022, 4:02 p.m. UTC | #2
On Wed, Sep 7, 2022 at 1:03 AM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
>
> On Tue, 2022-09-06 at 11:21 -0700, Alexei Starovoitov wrote:
> > On Tue, Sep 6, 2022 at 10:04 AM Roberto Sassu
> > <roberto.sassu@huaweicloud.com> wrote:
> > > From: Roberto Sassu <roberto.sassu@huawei.com>
> > >
> > > Commit 6e71b04a82248 ("bpf: Add file mode configuration into bpf
> > > maps")
> > > added the BPF_F_RDONLY and BPF_F_WRONLY flags, to let user space
> > > specify
> > > whether it will just read or modify a map.
> > >
> > > Map access control is done in two steps. First, when user space
> > > wants to
> > > obtain a map fd, it provides to the kernel the eBPF-defined flags,
> > > which
> > > are converted into open flags and passed to the security_bpf_map()
> > > security
> > > hook for evaluation by LSMs.
> > >
> > > Second, if user space successfully obtained an fd, it passes that
> > > fd to the
> > > kernel when it requests a map operation (e.g. lookup or update).
> > > The kernel
> > > first checks if the fd has the modes required to perform the
> > > requested
> > > operation and, if yes, continues the execution and returns the
> > > result to
> > > user space.
> > >
> > > While the fd modes check was added for map_*_elem() functions, it
> > > is
> > > currently missing for map iterators, added more recently with
> > > commit
> > > a5cbe05a6673 ("bpf: Implement bpf iterator for map elements"). A
> > > map
> > > iterator executes a chosen eBPF program for each key/value pair of
> > > a map
> > > and allows that program to read and/or modify them.
> > >
> > > Whether a map iterator allows only read or also write depends on
> > > whether
> > > the MEM_RDONLY flag in the ctx_arg_info member of the bpf_iter_reg
> > > structure is set. Also, write needs to be supported at verifier
> > > level (for
> > > example, it is currently not supported for sock maps).
> > >
> > > Since map iterators obtain a map from a user space fd with
> > > bpf_map_get_with_uref(), add the new req_modes parameter to that
> > > function,
> > > so that map iterators can provide the required fd modes to access a
> > > map. If
> > > the user space fd doesn't include the required modes,
> > > bpf_map_get_with_uref() returns with an error, and the map iterator
> > > will
> > > not be created.
> > >
> > > If a map iterator marks both the key and value as read-only, it
> > > calls
> > > bpf_map_get_with_uref() with FMODE_CAN_READ as value for req_modes.
> > > If it
> > > also allows write access to either the key or the value, it calls
> > > that
> > > function with FMODE_CAN_READ | FMODE_CAN_WRITE as value for
> > > req_modes,
> > > regardless of whether or not the write is supported by the verifier
> > > (the
> > > write is intentionally allowed).
> > >
> > > bpf_fd_probe_obj() does not require any fd mode, as the fd is only
> > > used for
> > > the purpose of finding the eBPF object type, for pinning the object
> > > to the
> > > bpffs filesystem.
> > >
> > > Finally, it is worth to mention that the fd modes check was not
> > > added for
> > > the cgroup iterator, although it registers an attach_target method
> > > like the
> > > other iterators. The reason is that the fd is not the only way for
> > > user
> > > space to reference a cgroup object (also by ID and by path). For
> > > the
> > > protection to be effective, all reference methods need to be
> > > evaluated
> > > consistently. This work is deferred to a separate patch.
> >
> > I think the current behavior is fine.
> > File permissions don't apply at iterator level or prog level.
>
> + Chenbo, linux-security-module
>
> Well, if you write a security module to prevent writes on a map, and
> user space is able to do it anyway with an iterator, what is the
> purpose of the security module then?

sounds like a broken "security module" and nothing else.

> > fmode_can_read/write are for syscall commands only.
> > To be fair we've added them to lookup/delete commands
> > and it was more of a pain to maintain and no confirmed good use.
>
> I think a good use would be requesting the right permission for the
> type of operation that needs to be performed, e.g. read-only permission
> when you have a read-like operation like a lookup or dump.
>
> By always requesting read-write permission, for all operations,
> security modules won't be able to distinguish which operation has to be
> denied to satisfy the policy.
>
> One example of that is that, when there is a security module preventing
> writes on maps (will be that uncommon?),

lsm that prevents writes into bpf maps? That's a convoluted design.
You can try to implement such an lsm, but expect lots of challenges.

> bpftool is not able to show
> the full list of maps because it asks for read-write permission for
> getting the map info.

completely orthogonal issue.

> Freezing the map is not a solution, if you want to allow certain
> subjects to continuously update the protected map at run-time.
>
> Roberto
>
Roberto Sassu Sept. 8, 2022, 1:58 p.m. UTC | #3
On Wed, 2022-09-07 at 09:02 -0700, Alexei Starovoitov wrote:
> 

[...]

> > Well, if you write a security module to prevent writes on a map,
> > and
> > user space is able to do it anyway with an iterator, what is the
> > purpose of the security module then?
> 
> sounds like a broken "security module" and nothing else.

Ok, if a custom security module does not convince you, let me make a
small example with SELinux.

I created a small map iterator that sets every value of a map to 5:

SEC("iter/bpf_map_elem")
int write_bpf_hash_map(struct bpf_iter__bpf_map_elem *ctx)
{
	u32 *key = ctx->key;
	u8 *val = ctx->value;

	if (key == NULL || val == NULL)
		return 0;

	*val = 5;
	return 0;
}

I create and pin a map:

# bpftool map create /sys/fs/bpf/map type array key 4 value 1 entries 1
name test

Initially, the content of the map looks like:

# bpftool map dump pinned /sys/fs/bpf/map 
key: 00 00 00 00  value: 00
Found 1 element

I then created a new SELinux type bpftool_test_t, which has only read
permission on maps:

# sesearch -A -s bpftool_test_t -t unconfined_t -c bpf
allow bpftool_test_t unconfined_t:bpf map_read;

So, what I expect is that this type is not able to write to the map.

Indeed, the current bpftool is not able to do it:

# strace -f -etrace=bpf runcon -t bpftool_test_t bpftool iter pin
writer.o /sys/fs/bpf/iter map pinned /sys/fs/bpf/map
bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/map", bpf_fd=0, file_flags=0},
144) = -1 EACCES (Permission denied)
Error: bpf obj get (/sys/fs/bpf): Permission denied

This happens because the current bpftool requests to access the map
with read-write permission, and SELinux denies it:

# cat /var/log/audit/audit.log|audit2allow


#============= bpftool_test_t ==============
allow bpftool_test_t unconfined_t:bpf map_write;


The command failed, and the content of the map is still:

# bpftool map dump pinned /sys/fs/bpf/map 
key: 00 00 00 00  value: 00
Found 1 element


Now, what I will do is to use a slightly modified version of bpftool
which requests read-only access to the map instead:

# strace -f -etrace=bpf runcon -t bpftool_test_t ./bpftool iter pin
writer.o /sys/fs/bpf/iter map pinned /sys/fs/bpf/map
bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/map", bpf_fd=0,
file_flags=BPF_F_RDONLY}, 16) = 3
libbpf: elf: skipping unrecognized data section(5) .eh_frame
libbpf: elf: skipping relo section(6) .rel.eh_frame for section(5)
.eh_frame

...

bpf(BPF_LINK_CREATE, {link_create={prog_fd=4, target_fd=0,
attach_type=BPF_TRACE_ITER, flags=0}, ...}, 48) = 5
bpf(BPF_OBJ_PIN, {pathname="/sys/fs/bpf/iter", bpf_fd=5, file_flags=0},
16) = 0

That worked, because SELinux grants read-only permission to
bpftool_test_t. However, the map iterator does not check how the fd was
obtained, and thus allows the iterator to be created.

At this point, we have write access, despite not having the right to do
it:

# cat /sys/fs/bpf/iter
# bpftool map dump pinned /sys/fs/bpf/map 
key: 00 00 00 00  value: 05
Found 1 element

The iterator updated the map value.


The patch I'm proposing checks how the map fd was obtained, and if its
modes are compatible with the operations an attached program is allowed
to do. If the fd does not have the required modes, eBPF denies the
creation of the map iterator.

After patching the kernel, I try to run the modified bpftool again:

# strace -f -etrace=bpf runcon -t bpftool_test_t ./bpftool iter pin
writer.o /sys/fs/bpf/iter map pinned /sys/fs/bpf/map
bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/map", bpf_fd=0,
file_flags=BPF_F_RDONLY}, 16) = 3
libbpf: elf: skipping unrecognized data section(5) .eh_frame
libbpf: elf: skipping relo section(6) .rel.eh_frame for section(5)
.eh_frame

...

bpf(BPF_LINK_CREATE, {link_create={prog_fd=4, target_fd=0,
attach_type=BPF_TRACE_ITER, flags=0}, ...}, 48) = -1 EPERM (Operation
not permitted)
libbpf: prog 'write_bpf_hash_map': failed to attach to iterator:
Operation not permitted
Error: attach_iter failed for program write_bpf_hash_map

The map iterator cannot be created and the map is not updated:

# bpftool map dump pinned /sys/fs/bpf/map 
key: 00 00 00 00  value: 00
Found 1 element

Roberto
Alexei Starovoitov Sept. 8, 2022, 3:17 p.m. UTC | #4
On Thu, Sep 8, 2022 at 6:59 AM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
>
> On Wed, 2022-09-07 at 09:02 -0700, Alexei Starovoitov wrote:
> >
>
> [...]
>
> > > Well, if you write a security module to prevent writes on a map,
> > > and
> > > user space is able to do it anyway with an iterator, what is the
> > > purpose of the security module then?
> >
> > sounds like a broken "security module" and nothing else.
>
> Ok, if a custom security module does not convince you, let me make a
> small example with SELinux.
>
> I created a small map iterator that sets every value of a map to 5:
>
> SEC("iter/bpf_map_elem")
> int write_bpf_hash_map(struct bpf_iter__bpf_map_elem *ctx)
> {
>         u32 *key = ctx->key;
>         u8 *val = ctx->value;
>
>         if (key == NULL || val == NULL)
>                 return 0;
>
>         *val = 5;
>         return 0;
> }
>
> I create and pin a map:
>
> # bpftool map create /sys/fs/bpf/map type array key 4 value 1 entries 1
> name test
>
> Initially, the content of the map looks like:
>
> # bpftool map dump pinned /sys/fs/bpf/map
> key: 00 00 00 00  value: 00
> Found 1 element
>
> I then created a new SELinux type bpftool_test_t, which has only read
> permission on maps:
>
> # sesearch -A -s bpftool_test_t -t unconfined_t -c bpf
> allow bpftool_test_t unconfined_t:bpf map_read;
>
> So, what I expect is that this type is not able to write to the map.
>
> Indeed, the current bpftool is not able to do it:
>
> # strace -f -etrace=bpf runcon -t bpftool_test_t bpftool iter pin
> writer.o /sys/fs/bpf/iter map pinned /sys/fs/bpf/map
> bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/map", bpf_fd=0, file_flags=0},
> 144) = -1 EACCES (Permission denied)
> Error: bpf obj get (/sys/fs/bpf): Permission denied
>
> This happens because the current bpftool requests to access the map
> with read-write permission, and SELinux denies it:
>
> # cat /var/log/audit/audit.log|audit2allow
>
>
> #============= bpftool_test_t ==============
> allow bpftool_test_t unconfined_t:bpf map_write;
>
>
> The command failed, and the content of the map is still:
>
> # bpftool map dump pinned /sys/fs/bpf/map
> key: 00 00 00 00  value: 00
> Found 1 element
>
>
> Now, what I will do is to use a slightly modified version of bpftool
> which requests read-only access to the map instead:
>
> # strace -f -etrace=bpf runcon -t bpftool_test_t ./bpftool iter pin
> writer.o /sys/fs/bpf/iter map pinned /sys/fs/bpf/map
> bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/map", bpf_fd=0,
> file_flags=BPF_F_RDONLY}, 16) = 3
> libbpf: elf: skipping unrecognized data section(5) .eh_frame
> libbpf: elf: skipping relo section(6) .rel.eh_frame for section(5)
> .eh_frame
>
> ...
>
> bpf(BPF_LINK_CREATE, {link_create={prog_fd=4, target_fd=0,
> attach_type=BPF_TRACE_ITER, flags=0}, ...}, 48) = 5
> bpf(BPF_OBJ_PIN, {pathname="/sys/fs/bpf/iter", bpf_fd=5, file_flags=0},
> 16) = 0
>
> That worked, because SELinux grants read-only permission to
> bpftool_test_t. However, the map iterator does not check how the fd was
> obtained, and thus allows the iterator to be created.
>
> At this point, we have write access, despite not having the right to do
> it:

That is a wrong assumption to begin with.
Having an fd to a bpf object (map, link, prog) allows access.
read/write sort-of applicable to maps, but not so much
to progs, links.
That file based read/write flag is only for user processes.
bpf progs always had separate flags for that.
See BPF_F_RDONLY vs BPF_F_RDONLY_PROG.
One doesn't imply the other.