mbox series

[bpf-next,V9,0/7] bpf: New approach for BPF MTU handling

Message ID 160822594178.3481451.1208057539613401103.stgit@firesoul
Headers show
Series bpf: New approach for BPF MTU handling | expand

Message

Jesper Dangaard Brouer Dec. 17, 2020, 5:26 p.m. UTC
This patchset drops all the MTU checks in TC BPF-helpers that limits
growing the packet size. This is done because these BPF-helpers doesn't
take redirect into account, which can result in their MTU check being done
against the wrong netdev.

The new approach is to give BPF-programs knowledge about the MTU on a
netdev (via ifindex) and fib route lookup level. Meaning some BPF-helpers
are added and extended to make it possible to do MTU checks in the
BPF-code.

If BPF-prog doesn't comply with the MTU then the packet will eventually
get dropped as some other layer. In some cases the existing kernel MTU
checks will drop the packet, but there are also cases where BPF can bypass
these checks. Specifically doing TC-redirect from ingress step
(sch_handle_ingress) into egress code path (basically calling
dev_queue_xmit()). It is left up to driver code to handle these kind of
MTU violations.

One advantage of this approach is that it ingress-to-egress BPF-prog can
send information via packet data. With the MTU checks removed in the
helpers, and also not done in skb_do_redirect() call, this allows for an
ingress BPF-prog to communicate with an egress BPF-prog via packet data,
as long as egress BPF-prog remove this prior to transmitting packet.

This patchset is primarily focused on TC-BPF, but I've made sure that the
MTU BPF-helpers also works for XDP BPF-programs.

V2: Change BPF-helper API from lookup to check.
V3: Drop enforcement of MTU in net-core, leave it to drivers.
V4: Keep sanity limit + netdev "up" checks + rename BPF-helper.
V5: Fix uninit variable + name struct output member mtu_result.
V6: Use bpf_check_mtu() in selftest
V7: Fix logic using tot_len and add another selftest
V8: Add better selftests for BPF-helper bpf_check_mtu
V9: Remove patch that use skb_set_redirected

---

Jesper Dangaard Brouer (7):
      bpf: Remove MTU check in __bpf_skb_max_len
      bpf: fix bpf_fib_lookup helper MTU check for SKB ctx
      bpf: bpf_fib_lookup return MTU value as output when looked up
      bpf: add BPF-helper for MTU checking
      bpf: drop MTU check when doing TC-BPF redirect to ingress
      selftests/bpf: use bpf_check_mtu in selftest test_cls_redirect
      bpf/selftests: tests using bpf_check_mtu BPF-helper


 include/linux/netdevice.h                          |   31 +++
 include/uapi/linux/bpf.h                           |   78 +++++++-
 net/core/dev.c                                     |   19 --
 net/core/filter.c                                  |  184 ++++++++++++++++--
 tools/include/uapi/linux/bpf.h                     |   78 +++++++-
 tools/testing/selftests/bpf/prog_tests/check_mtu.c |  204 ++++++++++++++++++++
 tools/testing/selftests/bpf/progs/test_check_mtu.c |  196 +++++++++++++++++++
 .../selftests/bpf/progs/test_cls_redirect.c        |    7 +
 8 files changed, 754 insertions(+), 43 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/check_mtu.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_check_mtu.c

--

Comments

Jesper Dangaard Brouer Dec. 18, 2020, 10:53 a.m. UTC | #1
On Thu, 17 Dec 2020 18:26:50 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> Adding selftest for BPF-helper bpf_check_mtu(). Making sure

> it can be used from both XDP and TC.

> 

> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

> ---

>  tools/testing/selftests/bpf/prog_tests/check_mtu.c |  204 ++++++++++++++++++++

>  tools/testing/selftests/bpf/progs/test_check_mtu.c |  196 +++++++++++++++++++

>  2 files changed, 400 insertions(+)

>  create mode 100644 tools/testing/selftests/bpf/prog_tests/check_mtu.c

>  create mode 100644 tools/testing/selftests/bpf/progs/test_check_mtu.c


Will send V10 as I have an error in this selftests

> diff --git a/tools/testing/selftests/bpf/prog_tests/check_mtu.c b/tools/testing/selftests/bpf/prog_tests/check_mtu.c

> new file mode 100644

> index 000000000000..b5d0c3a9abe8

> --- /dev/null

> +++ b/tools/testing/selftests/bpf/prog_tests/check_mtu.c

[...]
> +static void test_check_mtu_run_xdp(struct test_check_mtu *skel,

> +				   struct bpf_program *prog,

> +				   __u32 mtu_expect)

> +{

> +	const char *prog_name = bpf_program__name(prog);

> +	int retval_expect = XDP_PASS;

> +	__u32 mtu_result = 0;

> +	char buf[256];

> +	int err;

> +

> +	struct bpf_prog_test_run_attr tattr = {

> +		.repeat = 1,

> +		.data_in = &pkt_v4,

> +		.data_size_in = sizeof(pkt_v4),

> +		.data_out = buf,

> +		.data_size_out = sizeof(buf),

> +		.prog_fd = bpf_program__fd(prog),

> +	};

> +

> +	memset(buf, 0, sizeof(buf));

> +

> +	err = bpf_prog_test_run_xattr(&tattr);

> +	CHECK_ATTR(err != 0 || errno != 0, "bpf_prog_test_run",

                               ^^^^^^^^^^^
You/I cannot use the check "errno != 0" here, as something else could
have set it earlier.


> +		   "prog_name:%s (err %d errno %d retval %d)\n",

> +		   prog_name, err, errno, tattr.retval);

> +

> +        CHECK(tattr.retval != retval_expect, "retval",

> +	      "progname:%s unexpected retval=%d expected=%d\n",

> +	      prog_name, tattr.retval, retval_expect);

> +

> +	/* Extract MTU that BPF-prog got */

> +	mtu_result = skel->bss->global_bpf_mtu_xdp;

> +	CHECK(mtu_result != mtu_expect, "MTU-compare-user",

> +	      "failed (MTU user:%d bpf:%d)", mtu_expect, mtu_result);

> +}



> +static void test_check_mtu_run_tc(struct test_check_mtu *skel,

> +				  struct bpf_program *prog,

> +				  __u32 mtu_expect)

> +{

[...]
> +	err = bpf_prog_test_run_xattr(&tattr);

> +	CHECK_ATTR(err != 0 || errno != 0, "bpf_prog_test_run",

> +		   "prog_name:%s (err %d errno %d retval %d)\n",

> +		   prog_name, err, errno, tattr.retval);


Same issue here.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
Andrii Nakryiko Dec. 18, 2020, 8:13 p.m. UTC | #2
On Thu, Dec 17, 2020 at 9:30 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>

> Adding selftest for BPF-helper bpf_check_mtu(). Making sure

> it can be used from both XDP and TC.

>

> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

> ---

>  tools/testing/selftests/bpf/prog_tests/check_mtu.c |  204 ++++++++++++++++++++

>  tools/testing/selftests/bpf/progs/test_check_mtu.c |  196 +++++++++++++++++++

>  2 files changed, 400 insertions(+)

>  create mode 100644 tools/testing/selftests/bpf/prog_tests/check_mtu.c

>  create mode 100644 tools/testing/selftests/bpf/progs/test_check_mtu.c

>

> diff --git a/tools/testing/selftests/bpf/prog_tests/check_mtu.c b/tools/testing/selftests/bpf/prog_tests/check_mtu.c

> new file mode 100644

> index 000000000000..b5d0c3a9abe8

> --- /dev/null

> +++ b/tools/testing/selftests/bpf/prog_tests/check_mtu.c

> @@ -0,0 +1,204 @@

> +// SPDX-License-Identifier: GPL-2.0

> +/* Copyright (c) 2020 Jesper Dangaard Brouer */

> +

> +#include <linux/if_link.h> /* before test_progs.h, avoid bpf_util.h redefines */

> +

> +#include <test_progs.h>

> +#include "test_check_mtu.skel.h"

> +#include <network_helpers.h>

> +

> +#include <stdlib.h>

> +#include <inttypes.h>

> +

> +#define IFINDEX_LO 1

> +

> +static __u32 duration; /* Hint: needed for CHECK macro */

> +

> +static int read_mtu_device_lo(void)

> +{

> +       const char *filename = "/sys/devices/virtual/net/lo/mtu";

> +       char buf[11] = {};

> +       int value;

> +       int fd;

> +

> +       fd = open(filename, 0, O_RDONLY);

> +       if (fd == -1)

> +               return -1;

> +

> +       if (read(fd, buf, sizeof(buf)) == -1)


close fd missing here?

> +               return -2;

> +       close(fd);

> +

> +       value = strtoimax(buf, NULL, 10);

> +       if (errno == ERANGE)

> +               return -3;

> +

> +       return value;

> +}

> +

> +static void test_check_mtu_xdp_attach(struct bpf_program *prog)

> +{

> +       int err = 0;

> +       int fd;

> +

> +       fd = bpf_program__fd(prog);

> +       err = bpf_set_link_xdp_fd(IFINDEX_LO, fd, XDP_FLAGS_SKB_MODE);

> +       if (CHECK(err, "XDP-attach", "failed"))

> +               return;

> +

> +       bpf_set_link_xdp_fd(IFINDEX_LO, -1, 0);


can you please use bpf_link-based bpf_program__attach_xdp() which will
provide auto-cleanup in case of crash?

also check that it succeeded?

> +}

> +

> +static void test_check_mtu_run_xdp(struct test_check_mtu *skel,

> +                                  struct bpf_program *prog,

> +                                  __u32 mtu_expect)

> +{

> +       const char *prog_name = bpf_program__name(prog);

> +       int retval_expect = XDP_PASS;

> +       __u32 mtu_result = 0;

> +       char buf[256];

> +       int err;

> +

> +       struct bpf_prog_test_run_attr tattr = {

> +               .repeat = 1,

> +               .data_in = &pkt_v4,

> +               .data_size_in = sizeof(pkt_v4),

> +               .data_out = buf,

> +               .data_size_out = sizeof(buf),

> +               .prog_fd = bpf_program__fd(prog),

> +       };


nit: it's a variable declaration, so keep it all in one block. There
is also opts-based variant, which might be good to use here instead.

> +

> +       memset(buf, 0, sizeof(buf));


char buf[256] = {}; would make this unnecessary


> +

> +       err = bpf_prog_test_run_xattr(&tattr);

> +       CHECK_ATTR(err != 0 || errno != 0, "bpf_prog_test_run",

> +                  "prog_name:%s (err %d errno %d retval %d)\n",

> +                  prog_name, err, errno, tattr.retval);

> +

> +        CHECK(tattr.retval != retval_expect, "retval",


whitespaces are off?

> +             "progname:%s unexpected retval=%d expected=%d\n",

> +             prog_name, tattr.retval, retval_expect);

> +

> +       /* Extract MTU that BPF-prog got */

> +       mtu_result = skel->bss->global_bpf_mtu_xdp;

> +       CHECK(mtu_result != mtu_expect, "MTU-compare-user",

> +             "failed (MTU user:%d bpf:%d)", mtu_expect, mtu_result);


There is nicer ASSERT_EQ() macro for such cases:

ASSERT_EQ(mtu_result, mtu_expect, "MTU-compare-user"); it will format
sensible error message automatically

> +}

> +


[...]

> +       char buf[256];

> +       int err;

> +

> +       struct bpf_prog_test_run_attr tattr = {

> +               .repeat = 1,

> +               .data_in = &pkt_v4,

> +               .data_size_in = sizeof(pkt_v4),

> +               .data_out = buf,

> +               .data_size_out = sizeof(buf),

> +               .prog_fd = bpf_program__fd(prog),

> +       };

> +

> +       memset(buf, 0, sizeof(buf));

> +


same as above

> +       err = bpf_prog_test_run_xattr(&tattr);

> +       CHECK_ATTR(err != 0 || errno != 0, "bpf_prog_test_run",

> +                  "prog_name:%s (err %d errno %d retval %d)\n",

> +                  prog_name, err, errno, tattr.retval);

> +

> +        CHECK(tattr.retval != retval_expect, "retval",


same :)

> +             "progname:%s unexpected retval=%d expected=%d\n",

> +             prog_name, tattr.retval, retval_expect);

> +

> +       /* Extract MTU that BPF-prog got */

> +       mtu_result = skel->bss->global_bpf_mtu_tc;

> +       CHECK(mtu_result != mtu_expect, "MTU-compare-user",

> +             "failed (MTU user:%d bpf:%d)", mtu_expect, mtu_result);

> +}

> +

> +


[...]

> +

> +void test_check_mtu(void)

> +{

> +       struct test_check_mtu *skel;

> +       __u32 mtu_lo;

> +

> +       skel = test_check_mtu__open_and_load();

> +       if (CHECK(!skel, "open and load skel", "failed"))

> +               return; /* Exit if e.g. helper unknown to kernel */

> +

> +       if (test__start_subtest("bpf_check_mtu XDP-attach"))

> +               test_check_mtu_xdp_attach(skel->progs.xdp_use_helper_basic);

> +

> +       test_check_mtu__destroy(skel);


here it's not clear why you instantiate skeleton outside of
test_check_mtu_xdp_attach() subtest. Can you please move it in? It
will keep this failure local to that specific subtest, not the entire
test. And is just cleaner, of course.

> +

> +       mtu_lo = read_mtu_device_lo();

> +       if (CHECK(mtu_lo < 0, "reading MTU value", "failed (err:%d)", mtu_lo))


ASSERT_OK() could be used here

> +               return;

> +

> +       if (test__start_subtest("bpf_check_mtu XDP-run"))

> +               test_check_mtu_xdp(mtu_lo, 0);

> +

> +       if (test__start_subtest("bpf_check_mtu XDP-run ifindex-lookup"))

> +               test_check_mtu_xdp(mtu_lo, IFINDEX_LO);

> +

> +       if (test__start_subtest("bpf_check_mtu TC-run"))

> +               test_check_mtu_tc(mtu_lo, 0);

> +

> +       if (test__start_subtest("bpf_check_mtu TC-run ifindex-lookup"))

> +               test_check_mtu_tc(mtu_lo, IFINDEX_LO);

> +}


[...]

> +

> +       global_bpf_mtu_tc = mtu_len;

> +       return retval;

> +}

> +

> +SEC("classifier")


nice use of the same SEC()'tion BPF programs!


> +int tc_minus_delta(struct __sk_buff *ctx)

> +{

> +       int retval = BPF_OK; /* Expected retval on successful test */

> +       __u32 ifindex = GLOBAL_USER_IFINDEX;

> +       __u32 skb_len = ctx->len;

> +       __u32 mtu_len = 0;

> +       int delta;

> +

> +       /* Boarderline test case: Minus delta exceeding packet length allowed */

> +       delta = -((skb_len - ETH_HLEN) + 1);

> +

> +       /* Minus length (adjusted via delta) still pass MTU check, other helpers

> +        * are responsible for catching this, when doing actual size adjust

> +        */

> +       if (bpf_check_mtu(ctx, ifindex, &mtu_len, delta, 0))

> +               retval = BPF_DROP;

> +

> +       global_bpf_mtu_xdp = mtu_len;

> +       return retval;

> +}

>

>
Jesper Dangaard Brouer Dec. 22, 2020, 3:30 p.m. UTC | #3
On Fri, 18 Dec 2020 12:13:45 -0800
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Thu, Dec 17, 2020 at 9:30 AM Jesper Dangaard Brouer

> <brouer@redhat.com> wrote:

> >

> > Adding selftest for BPF-helper bpf_check_mtu(). Making sure

> > it can be used from both XDP and TC.

> >

> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

> > ---

> >  tools/testing/selftests/bpf/prog_tests/check_mtu.c |  204 ++++++++++++++++++++

> >  tools/testing/selftests/bpf/progs/test_check_mtu.c |  196 +++++++++++++++++++

> >  2 files changed, 400 insertions(+)

> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/check_mtu.c

> >  create mode 100644 tools/testing/selftests/bpf/progs/test_check_mtu.c

> >

> > diff --git a/tools/testing/selftests/bpf/prog_tests/check_mtu.c b/tools/testing/selftests/bpf/prog_tests/check_mtu.c

> > new file mode 100644

> > index 000000000000..b5d0c3a9abe8

> > --- /dev/null

> > +++ b/tools/testing/selftests/bpf/prog_tests/check_mtu.c

> > @@ -0,0 +1,204 @@

> > +// SPDX-License-Identifier: GPL-2.0

> > +/* Copyright (c) 2020 Jesper Dangaard Brouer */

> > +

> > +#include <linux/if_link.h> /* before test_progs.h, avoid bpf_util.h redefines */

> > +

> > +#include <test_progs.h>

> > +#include "test_check_mtu.skel.h"

> > +#include <network_helpers.h>

> > +

> > +#include <stdlib.h>

> > +#include <inttypes.h>

> > +

> > +#define IFINDEX_LO 1

> > +

> > +static __u32 duration; /* Hint: needed for CHECK macro */

> > +

> > +static int read_mtu_device_lo(void)

> > +{

> > +       const char *filename = "/sys/devices/virtual/net/lo/mtu";


I will change this to: /sys/class/net/lo/mtu

> > +       char buf[11] = {};

> > +       int value;

> > +       int fd;

> > +

> > +       fd = open(filename, 0, O_RDONLY);

> > +       if (fd == -1)

> > +               return -1;

> > +

> > +       if (read(fd, buf, sizeof(buf)) == -1)  

> 

> close fd missing here?


ack, fixed.

> > +               return -2;

> > +       close(fd);

> > +

> > +       value = strtoimax(buf, NULL, 10);

> > +       if (errno == ERANGE)

> > +               return -3;

> > +

> > +       return value;

> > +}

> > +

> > +static void test_check_mtu_xdp_attach(struct bpf_program *prog)

> > +{

> > +       int err = 0;

> > +       int fd;

> > +

> > +       fd = bpf_program__fd(prog);

> > +       err = bpf_set_link_xdp_fd(IFINDEX_LO, fd, XDP_FLAGS_SKB_MODE);

> > +       if (CHECK(err, "XDP-attach", "failed"))

> > +               return;

> > +

> > +       bpf_set_link_xdp_fd(IFINDEX_LO, -1, 0);  

> 

> can you please use bpf_link-based bpf_program__attach_xdp() which will

> provide auto-cleanup in case of crash?


Sure, that will be good for me to learn.

> also check that it succeeded?

> 

> > +}

> > +

> > +static void test_check_mtu_run_xdp(struct test_check_mtu *skel,

> > +                                  struct bpf_program *prog,

> > +                                  __u32 mtu_expect)

> > +{

> > +       const char *prog_name = bpf_program__name(prog);

> > +       int retval_expect = XDP_PASS;

> > +       __u32 mtu_result = 0;

> > +       char buf[256];

> > +       int err;

> > +

> > +       struct bpf_prog_test_run_attr tattr = {

> > +               .repeat = 1,

> > +               .data_in = &pkt_v4,

> > +               .data_size_in = sizeof(pkt_v4),

> > +               .data_out = buf,

> > +               .data_size_out = sizeof(buf),

> > +               .prog_fd = bpf_program__fd(prog),

> > +       };  

> 

> nit: it's a variable declaration, so keep it all in one block. There

> is also opts-based variant, which might be good to use here instead.

> 

> > +

> > +       memset(buf, 0, sizeof(buf));  

> 

> char buf[256] = {}; would make this unnecessary


ok.

> 

> > +

> > +       err = bpf_prog_test_run_xattr(&tattr);

> > +       CHECK_ATTR(err != 0 || errno != 0, "bpf_prog_test_run",

> > +                  "prog_name:%s (err %d errno %d retval %d)\n",

> > +                  prog_name, err, errno, tattr.retval);

> > +

> > +        CHECK(tattr.retval != retval_expect, "retval",  

> 

> whitespaces are off?


Yes, I noticed with scripts/checkpatch.pl.  And there are a couple
more, that I've already fixed.


> > +             "progname:%s unexpected retval=%d expected=%d\n",

> > +             prog_name, tattr.retval, retval_expect);

> > +

> > +       /* Extract MTU that BPF-prog got */

> > +       mtu_result = skel->bss->global_bpf_mtu_xdp;

> > +       CHECK(mtu_result != mtu_expect, "MTU-compare-user",

> > +             "failed (MTU user:%d bpf:%d)", mtu_expect, mtu_result);  

> 

> There is nicer ASSERT_EQ() macro for such cases:

> 

> ASSERT_EQ(mtu_result, mtu_expect, "MTU-compare-user"); it will format

> sensible error message automatically


Nice simplification :-)

> 

> > +}

> > +  

> 

> [...]


[... same ...] 

> [...]

> 

> > +

> > +void test_check_mtu(void)

> > +{

> > +       struct test_check_mtu *skel;

> > +       __u32 mtu_lo;

> > +

> > +       skel = test_check_mtu__open_and_load();

> > +       if (CHECK(!skel, "open and load skel", "failed"))

> > +               return; /* Exit if e.g. helper unknown to kernel */

> > +

> > +       if (test__start_subtest("bpf_check_mtu XDP-attach"))

> > +               test_check_mtu_xdp_attach(skel->progs.xdp_use_helper_basic);

> > +

> > +       test_check_mtu__destroy(skel);  

> 

> here it's not clear why you instantiate skeleton outside of

> test_check_mtu_xdp_attach() subtest. Can you please move it in? It

> will keep this failure local to that specific subtest, not the entire

> test. And is just cleaner, of course.


Sure will "move it in".  The intent was to fail the entire test if this
failed, but it is more clean to "move it in".

> > +

> > +       mtu_lo = read_mtu_device_lo();

> > +       if (CHECK(mtu_lo < 0, "reading MTU value", "failed (err:%d)", mtu_lo))  

> 

> ASSERT_OK() could be used here

> 

> > +               return;

> > +

> > +       if (test__start_subtest("bpf_check_mtu XDP-run"))

> > +               test_check_mtu_xdp(mtu_lo, 0);

> > +

> > +       if (test__start_subtest("bpf_check_mtu XDP-run ifindex-lookup"))

> > +               test_check_mtu_xdp(mtu_lo, IFINDEX_LO);

> > +

> > +       if (test__start_subtest("bpf_check_mtu TC-run"))

> > +               test_check_mtu_tc(mtu_lo, 0);

> > +

> > +       if (test__start_subtest("bpf_check_mtu TC-run ifindex-lookup"))

> > +               test_check_mtu_tc(mtu_lo, IFINDEX_LO);

> > +}  

> 

> [...]

> 

> > +

> > +       global_bpf_mtu_tc = mtu_len;

> > +       return retval;

> > +}

> > +

> > +SEC("classifier")  

> 

> nice use of the same SEC()'tion BPF programs!

> 

> 

> > +int tc_minus_delta(struct __sk_buff *ctx)

> > +{

> > +       int retval = BPF_OK; /* Expected retval on successful test */

> > +       __u32 ifindex = GLOBAL_USER_IFINDEX;

> > +       __u32 skb_len = ctx->len;

> > +       __u32 mtu_len = 0;

> > +       int delta;

> > +

> > +       /* Boarderline test case: Minus delta exceeding packet length allowed */

> > +       delta = -((skb_len - ETH_HLEN) + 1);

> > +

> > +       /* Minus length (adjusted via delta) still pass MTU check, other helpers

> > +        * are responsible for catching this, when doing actual size adjust

> > +        */

> > +       if (bpf_check_mtu(ctx, ifindex, &mtu_len, delta, 0))

> > +               retval = BPF_DROP;

> > +

> > +       global_bpf_mtu_xdp = mtu_len;

> > +       return retval;

> > +}




-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer