Message ID | 20201009101356.129228-1-toke@redhat.com |
---|---|
State | New |
Headers | show |
Series | [bpf-next,v2] bpf_fib_lookup: optionally skip neighbour lookup | expand |
On 10/9/20 3:13 AM, Toke Høiland-Jørgensen wrote: > The bpf_fib_lookup() helper performs a neighbour lookup for the destination > IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation > that the BPF program will pass the packet up the stack in this case. > However, with the addition of bpf_redirect_neigh() that can be used instead > to perform the neighbour lookup, at the cost of a bit of duplicated work. > > For that we still need the target ifindex, and since bpf_fib_lookup() > already has that at the time it performs the neighbour lookup, there is > really no reason why it can't just return it in any case. So let's just > always return the ifindex, and also add a flag that lets the caller turn > off the neighbour lookup entirely in bpf_fib_lookup(). seems really odd to do the fib lookup only to skip the neighbor lookup and defer to a second helper to do a second fib lookup and send out. The better back-to-back calls is to return the ifindex and gateway on successful fib lookup regardless of valid neighbor. If the call to bpf_redirect_neigh is needed, it can have a flag to skip the fib lookup and just redirect to the given nexthop address + ifindex. ie., bpf_redirect_neigh only does neighbor handling in this case. > > v2: > - Add flag (Daniel) > - Remove misleading code example from commit message (David) > > Cc: David Ahern <dsahern@gmail.com> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > --- > include/uapi/linux/bpf.h | 10 ++++++---- > net/core/filter.c | 15 ++++++++++++--- > tools/include/uapi/linux/bpf.h | 10 ++++++---- > 3 files changed, 24 insertions(+), 11 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index d83561e8cd2c..9c7c10ce7ace 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -4813,12 +4813,14 @@ struct bpf_raw_tracepoint_args { > __u64 args[0]; > }; > > -/* DIRECT: Skip the FIB rules and go to FIB table associated with device > - * OUTPUT: Do lookup from egress perspective; default is ingress > +/* DIRECT: Skip the FIB rules and go to FIB table associated with device > + * OUTPUT: Do lookup from egress perspective; default is ingress > + * SKIP_NEIGH: Skip neighbour lookup and return BPF_FIB_LKUP_RET_NO_NEIGH on success > */ > enum { > - BPF_FIB_LOOKUP_DIRECT = (1U << 0), > - BPF_FIB_LOOKUP_OUTPUT = (1U << 1), > + BPF_FIB_LOOKUP_DIRECT = (1U << 0), > + BPF_FIB_LOOKUP_OUTPUT = (1U << 1), > + BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), > }; > > enum { > diff --git a/net/core/filter.c b/net/core/filter.c > index 05df73780dd3..1038337bc06c 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -5192,7 +5192,6 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, > memcpy(params->smac, dev->dev_addr, ETH_ALEN); > params->h_vlan_TCI = 0; > params->h_vlan_proto = 0; > - params->ifindex = dev->ifindex; > > return 0; > } > @@ -5289,6 +5288,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, > dev = nhc->nhc_dev; > > params->rt_metric = res.fi->fib_priority; > + params->ifindex = dev->ifindex; > + > + if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) > + return BPF_FIB_LKUP_RET_NO_NEIGH; > > /* xdp and cls_bpf programs are run in RCU-bh so > * rcu_read_lock_bh is not needed here > @@ -5414,6 +5417,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, > > dev = res.nh->fib_nh_dev; > params->rt_metric = res.f6i->fib6_metric; > + params->ifindex = dev->ifindex; > + > + if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) > + return BPF_FIB_LKUP_RET_NO_NEIGH; > > /* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is > * not needed here. > @@ -5432,7 +5439,8 @@ BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx, > if (plen < sizeof(*params)) > return -EINVAL; > > - if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) > + if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | > + BPF_FIB_LOOKUP_SKIP_NEIGH)) > return -EINVAL; > > switch (params->family) { > @@ -5469,7 +5477,8 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb, > if (plen < sizeof(*params)) > return -EINVAL; > > - if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) > + if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | > + BPF_FIB_LOOKUP_SKIP_NEIGH)) > return -EINVAL; > > switch (params->family) { > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index d83561e8cd2c..9c7c10ce7ace 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -4813,12 +4813,14 @@ struct bpf_raw_tracepoint_args { > __u64 args[0]; > }; > > -/* DIRECT: Skip the FIB rules and go to FIB table associated with device > - * OUTPUT: Do lookup from egress perspective; default is ingress > +/* DIRECT: Skip the FIB rules and go to FIB table associated with device > + * OUTPUT: Do lookup from egress perspective; default is ingress > + * SKIP_NEIGH: Skip neighbour lookup and return BPF_FIB_LKUP_RET_NO_NEIGH on success > */ > enum { > - BPF_FIB_LOOKUP_DIRECT = (1U << 0), > - BPF_FIB_LOOKUP_OUTPUT = (1U << 1), > + BPF_FIB_LOOKUP_DIRECT = (1U << 0), > + BPF_FIB_LOOKUP_OUTPUT = (1U << 1), > + BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), > }; > > enum { >
David Ahern <dsahern@gmail.com> writes: > On 10/9/20 3:13 AM, Toke Høiland-Jørgensen wrote: >> The bpf_fib_lookup() helper performs a neighbour lookup for the destination >> IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation >> that the BPF program will pass the packet up the stack in this case. >> However, with the addition of bpf_redirect_neigh() that can be used instead >> to perform the neighbour lookup, at the cost of a bit of duplicated work. >> >> For that we still need the target ifindex, and since bpf_fib_lookup() >> already has that at the time it performs the neighbour lookup, there is >> really no reason why it can't just return it in any case. So let's just >> always return the ifindex, and also add a flag that lets the caller turn >> off the neighbour lookup entirely in bpf_fib_lookup(). > > seems really odd to do the fib lookup only to skip the neighbor lookup > and defer to a second helper to do a second fib lookup and send out. > > The better back-to-back calls is to return the ifindex and gateway on > successful fib lookup regardless of valid neighbor. If the call to > bpf_redirect_neigh is needed, it can have a flag to skip the fib lookup > and just redirect to the given nexthop address + ifindex. ie., > bpf_redirect_neigh only does neighbor handling in this case. Hmm, yeah, I guess it would make sense to cache and reuse the lookup - maybe stick it in bpf_redirect_info()? However, given the imminent opening of the merge window, I don't see this landing before then. So I'm going to respin this patch with just the original change to always return the ifindex, then we can revisit the flags/reuse of the fib lookup later. -Toke
On 10/9/20 11:42 AM, Toke Høiland-Jørgensen wrote: > David Ahern <dsahern@gmail.com> writes: > >> On 10/9/20 3:13 AM, Toke Høiland-Jørgensen wrote: >>> The bpf_fib_lookup() helper performs a neighbour lookup for the destination >>> IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation >>> that the BPF program will pass the packet up the stack in this case. >>> However, with the addition of bpf_redirect_neigh() that can be used instead >>> to perform the neighbour lookup, at the cost of a bit of duplicated work. >>> >>> For that we still need the target ifindex, and since bpf_fib_lookup() >>> already has that at the time it performs the neighbour lookup, there is >>> really no reason why it can't just return it in any case. So let's just >>> always return the ifindex, and also add a flag that lets the caller turn >>> off the neighbour lookup entirely in bpf_fib_lookup(). >> >> seems really odd to do the fib lookup only to skip the neighbor lookup >> and defer to a second helper to do a second fib lookup and send out. >> >> The better back-to-back calls is to return the ifindex and gateway on >> successful fib lookup regardless of valid neighbor. If the call to >> bpf_redirect_neigh is needed, it can have a flag to skip the fib lookup >> and just redirect to the given nexthop address + ifindex. ie., >> bpf_redirect_neigh only does neighbor handling in this case. > > Hmm, yeah, I guess it would make sense to cache and reuse the lookup - > maybe stick it in bpf_redirect_info()? However, given the imminent That is not needed. > opening of the merge window, I don't see this landing before then. So > I'm going to respin this patch with just the original change to always > return the ifindex, then we can revisit the flags/reuse of the fib > lookup later. > What I am suggesting is a change in API to bpf_redirect_neigh which should be done now, before the merge window, before it comes a locked API. Right now, bpf_redirect_neigh does a lookup to get the nexthop. It should take the gateway as an input argument. If set, then the lookup is not done - only the neighbor redirect.
On 10/9/20 11:28 PM, David Ahern wrote: > On 10/9/20 11:42 AM, Toke Høiland-Jørgensen wrote: >> David Ahern <dsahern@gmail.com> writes: >>> On 10/9/20 3:13 AM, Toke Høiland-Jørgensen wrote: >>>> The bpf_fib_lookup() helper performs a neighbour lookup for the destination >>>> IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation >>>> that the BPF program will pass the packet up the stack in this case. >>>> However, with the addition of bpf_redirect_neigh() that can be used instead >>>> to perform the neighbour lookup, at the cost of a bit of duplicated work. >>>> >>>> For that we still need the target ifindex, and since bpf_fib_lookup() >>>> already has that at the time it performs the neighbour lookup, there is >>>> really no reason why it can't just return it in any case. So let's just >>>> always return the ifindex, and also add a flag that lets the caller turn >>>> off the neighbour lookup entirely in bpf_fib_lookup(). >>> >>> seems really odd to do the fib lookup only to skip the neighbor lookup >>> and defer to a second helper to do a second fib lookup and send out. >>> >>> The better back-to-back calls is to return the ifindex and gateway on >>> successful fib lookup regardless of valid neighbor. If the call to >>> bpf_redirect_neigh is needed, it can have a flag to skip the fib lookup >>> and just redirect to the given nexthop address + ifindex. ie., >>> bpf_redirect_neigh only does neighbor handling in this case. >> >> Hmm, yeah, I guess it would make sense to cache and reuse the lookup - >> maybe stick it in bpf_redirect_info()? However, given the imminent > > That is not needed. > >> opening of the merge window, I don't see this landing before then. So >> I'm going to respin this patch with just the original change to always >> return the ifindex, then we can revisit the flags/reuse of the fib >> lookup later. > > What I am suggesting is a change in API to bpf_redirect_neigh which > should be done now, before the merge window, before it comes a locked > API. Right now, bpf_redirect_neigh does a lookup to get the nexthop. It > should take the gateway as an input argument. If set, then the lookup is > not done - only the neighbor redirect. Sounds like a reasonable extension, agree. API freeze is not merge win, but final v5.10 tag in this case as it always has been. In case it's not in time, we can simply just move flags to arg3 and add a reserved param as arg2 which must be zero (and thus indicate to perform the lookup as-is). Later we could extend to pass params similar as in fib_lookup helper for the gw.
Daniel Borkmann <daniel@iogearbox.net> writes: > On 10/9/20 11:28 PM, David Ahern wrote: >> On 10/9/20 11:42 AM, Toke Høiland-Jørgensen wrote: >>> David Ahern <dsahern@gmail.com> writes: >>>> On 10/9/20 3:13 AM, Toke Høiland-Jørgensen wrote: >>>>> The bpf_fib_lookup() helper performs a neighbour lookup for the destination >>>>> IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation >>>>> that the BPF program will pass the packet up the stack in this case. >>>>> However, with the addition of bpf_redirect_neigh() that can be used instead >>>>> to perform the neighbour lookup, at the cost of a bit of duplicated work. >>>>> >>>>> For that we still need the target ifindex, and since bpf_fib_lookup() >>>>> already has that at the time it performs the neighbour lookup, there is >>>>> really no reason why it can't just return it in any case. So let's just >>>>> always return the ifindex, and also add a flag that lets the caller turn >>>>> off the neighbour lookup entirely in bpf_fib_lookup(). >>>> >>>> seems really odd to do the fib lookup only to skip the neighbor lookup >>>> and defer to a second helper to do a second fib lookup and send out. >>>> >>>> The better back-to-back calls is to return the ifindex and gateway on >>>> successful fib lookup regardless of valid neighbor. If the call to >>>> bpf_redirect_neigh is needed, it can have a flag to skip the fib lookup >>>> and just redirect to the given nexthop address + ifindex. ie., >>>> bpf_redirect_neigh only does neighbor handling in this case. >>> >>> Hmm, yeah, I guess it would make sense to cache and reuse the lookup - >>> maybe stick it in bpf_redirect_info()? However, given the imminent >> >> That is not needed. >> >>> opening of the merge window, I don't see this landing before then. So >>> I'm going to respin this patch with just the original change to always >>> return the ifindex, then we can revisit the flags/reuse of the fib >>> lookup later. >> >> What I am suggesting is a change in API to bpf_redirect_neigh which >> should be done now, before the merge window, before it comes a locked >> API. Right now, bpf_redirect_neigh does a lookup to get the nexthop. It >> should take the gateway as an input argument. If set, then the lookup is >> not done - only the neighbor redirect. > > Sounds like a reasonable extension, agree. API freeze is not merge win, but > final v5.10 tag in this case as it always has been. In case it's not in time, > we can simply just move flags to arg3 and add a reserved param as arg2 which > must be zero (and thus indicate to perform the lookup as-is). Later we could > extend to pass params similar as in fib_lookup helper for the gw. Right, I can take a look at this next week. Feel free to merge (v3 of) this patch now, that change will be needed in any case I think... -Toke
On 10/10/20 3:42 PM, Toke Høiland-Jørgensen wrote: > Daniel Borkmann <daniel@iogearbox.net> writes: >> On 10/9/20 11:28 PM, David Ahern wrote: >>> On 10/9/20 11:42 AM, Toke Høiland-Jørgensen wrote: >>>> David Ahern <dsahern@gmail.com> writes: >>>>> On 10/9/20 3:13 AM, Toke Høiland-Jørgensen wrote: >>>>>> The bpf_fib_lookup() helper performs a neighbour lookup for the destination >>>>>> IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation >>>>>> that the BPF program will pass the packet up the stack in this case. >>>>>> However, with the addition of bpf_redirect_neigh() that can be used instead >>>>>> to perform the neighbour lookup, at the cost of a bit of duplicated work. >>>>>> >>>>>> For that we still need the target ifindex, and since bpf_fib_lookup() >>>>>> already has that at the time it performs the neighbour lookup, there is >>>>>> really no reason why it can't just return it in any case. So let's just >>>>>> always return the ifindex, and also add a flag that lets the caller turn >>>>>> off the neighbour lookup entirely in bpf_fib_lookup(). >>>>> >>>>> seems really odd to do the fib lookup only to skip the neighbor lookup >>>>> and defer to a second helper to do a second fib lookup and send out. >>>>> >>>>> The better back-to-back calls is to return the ifindex and gateway on >>>>> successful fib lookup regardless of valid neighbor. If the call to >>>>> bpf_redirect_neigh is needed, it can have a flag to skip the fib lookup >>>>> and just redirect to the given nexthop address + ifindex. ie., >>>>> bpf_redirect_neigh only does neighbor handling in this case. >>>> >>>> Hmm, yeah, I guess it would make sense to cache and reuse the lookup - >>>> maybe stick it in bpf_redirect_info()? However, given the imminent >>> >>> That is not needed. >>> >>>> opening of the merge window, I don't see this landing before then. So >>>> I'm going to respin this patch with just the original change to always >>>> return the ifindex, then we can revisit the flags/reuse of the fib >>>> lookup later. >>> >>> What I am suggesting is a change in API to bpf_redirect_neigh which >>> should be done now, before the merge window, before it comes a locked >>> API. Right now, bpf_redirect_neigh does a lookup to get the nexthop. It >>> should take the gateway as an input argument. If set, then the lookup is >>> not done - only the neighbor redirect. >> >> Sounds like a reasonable extension, agree. API freeze is not merge win, but >> final v5.10 tag in this case as it always has been. In case it's not in time, >> we can simply just move flags to arg3 and add a reserved param as arg2 which >> must be zero (and thus indicate to perform the lookup as-is). Later we could >> extend to pass params similar as in fib_lookup helper for the gw. > > Right, I can take a look at this next week. Feel free to merge (v3 of) > this patch now, that change will be needed in any case I think... Ok, sounds reasonable, done. Lets fix the remaining one as David suggested until rc1, at latest rc2 time frame. I'll be mostly offline next week during the day, but happy to help till that deadline as well. Thanks, Daniel
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d83561e8cd2c..9c7c10ce7ace 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4813,12 +4813,14 @@ struct bpf_raw_tracepoint_args { __u64 args[0]; }; -/* DIRECT: Skip the FIB rules and go to FIB table associated with device - * OUTPUT: Do lookup from egress perspective; default is ingress +/* DIRECT: Skip the FIB rules and go to FIB table associated with device + * OUTPUT: Do lookup from egress perspective; default is ingress + * SKIP_NEIGH: Skip neighbour lookup and return BPF_FIB_LKUP_RET_NO_NEIGH on success */ enum { - BPF_FIB_LOOKUP_DIRECT = (1U << 0), - BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_DIRECT = (1U << 0), + BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), }; enum { diff --git a/net/core/filter.c b/net/core/filter.c index 05df73780dd3..1038337bc06c 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5192,7 +5192,6 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, memcpy(params->smac, dev->dev_addr, ETH_ALEN); params->h_vlan_TCI = 0; params->h_vlan_proto = 0; - params->ifindex = dev->ifindex; return 0; } @@ -5289,6 +5288,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, dev = nhc->nhc_dev; params->rt_metric = res.fi->fib_priority; + params->ifindex = dev->ifindex; + + if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) + return BPF_FIB_LKUP_RET_NO_NEIGH; /* xdp and cls_bpf programs are run in RCU-bh so * rcu_read_lock_bh is not needed here @@ -5414,6 +5417,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, dev = res.nh->fib_nh_dev; params->rt_metric = res.f6i->fib6_metric; + params->ifindex = dev->ifindex; + + if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) + return BPF_FIB_LKUP_RET_NO_NEIGH; /* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is * not needed here. @@ -5432,7 +5439,8 @@ BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx, if (plen < sizeof(*params)) return -EINVAL; - if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) + if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | + BPF_FIB_LOOKUP_SKIP_NEIGH)) return -EINVAL; switch (params->family) { @@ -5469,7 +5477,8 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb, if (plen < sizeof(*params)) return -EINVAL; - if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) + if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | + BPF_FIB_LOOKUP_SKIP_NEIGH)) return -EINVAL; switch (params->family) { diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index d83561e8cd2c..9c7c10ce7ace 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4813,12 +4813,14 @@ struct bpf_raw_tracepoint_args { __u64 args[0]; }; -/* DIRECT: Skip the FIB rules and go to FIB table associated with device - * OUTPUT: Do lookup from egress perspective; default is ingress +/* DIRECT: Skip the FIB rules and go to FIB table associated with device + * OUTPUT: Do lookup from egress perspective; default is ingress + * SKIP_NEIGH: Skip neighbour lookup and return BPF_FIB_LKUP_RET_NO_NEIGH on success */ enum { - BPF_FIB_LOOKUP_DIRECT = (1U << 0), - BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_DIRECT = (1U << 0), + BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), }; enum {
The bpf_fib_lookup() helper performs a neighbour lookup for the destination IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation that the BPF program will pass the packet up the stack in this case. However, with the addition of bpf_redirect_neigh() that can be used instead to perform the neighbour lookup, at the cost of a bit of duplicated work. For that we still need the target ifindex, and since bpf_fib_lookup() already has that at the time it performs the neighbour lookup, there is really no reason why it can't just return it in any case. So let's just always return the ifindex, and also add a flag that lets the caller turn off the neighbour lookup entirely in bpf_fib_lookup(). v2: - Add flag (Daniel) - Remove misleading code example from commit message (David) Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> --- include/uapi/linux/bpf.h | 10 ++++++---- net/core/filter.c | 15 ++++++++++++--- tools/include/uapi/linux/bpf.h | 10 ++++++---- 3 files changed, 24 insertions(+), 11 deletions(-)