mbox series

[v2,bpf-next,0/8] bpf: Allow bpf tcp iter to do bpf_(get|set)sockopt

Message ID 20210701200535.1033513-1-kafai@fb.com
Headers show
Series bpf: Allow bpf tcp iter to do bpf_(get|set)sockopt | expand

Message

Martin KaFai Lau July 1, 2021, 8:05 p.m. UTC
This set is to allow bpf tcp iter to call bpf_(get|set)sockopt.

With bpf-tcp-cc, new algo rollout happens more often.  Instead of
restarting the applications to pick up the new tcp-cc, this set
allows the bpf tcp iter to call bpf_(get|set)sockopt(TCP_CONGESTION).
It is not limited to TCP_CONGESTION, the bpf tcp iter can call
bpf_(get|set)sockopt() with other options.  The bpf tcp iter can read
into all the fields of a tcp_sock, so there is a lot of flexibility
to select the desired sk to do setsockopt(), e.g. it can test for
TCP_LISTEN only and leave the established connections untouched,
or check the addr/port, or check the current tcp-cc name, ...etc.

Patch 1-4 are some cleanup and prep work in the tcp and bpf seq_file.

Patch 5 is to have the tcp seq_file iterate on the
port+addr lhash2 instead of the port only listening_hash.

Patch 6 is to have the bpf tcp iter doing batching which
then allows lock_sock.  lock_sock is needed for setsockopt.

Patch 7 allows the bpf tcp iter to call bpf_(get|set)sockopt.

v2:
- Use __GFP_NOWARN in patch 6
- Add bpf_getsockopt() in patch 7 to give a symmetrical user experience.
  selftest in patch 8 is changed to also cover bpf_getsockopt().
- Remove CAP_NET_ADMIN check in patch 7. Tracing bpf prog has already
  required CAP_SYS_ADMIN or CAP_PERFMON.
- Move some def macros to bpf_tracing_net.h in patch 8

Martin KaFai Lau (8):
  tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos
  tcp: seq_file: Refactor net and family matching
  bpf: tcp: seq_file: Remove bpf_seq_afinfo from tcp_iter_state
  tcp: seq_file: Add listening_get_first()
  tcp: seq_file: Replace listening_hash with lhash2
  bpf: tcp: bpf iter batching and lock_sock
  bpf: tcp: Support bpf_(get|set)sockopt in bpf tcp iter
  bpf: selftest: Test batching and bpf_(get|set)sockopt in bpf tcp iter

 include/linux/bpf.h                           |   8 +
 include/net/inet_hashtables.h                 |   6 +
 include/net/tcp.h                             |   1 -
 kernel/bpf/bpf_iter.c                         |  22 +
 kernel/trace/bpf_trace.c                      |   7 +-
 net/core/filter.c                             |  34 ++
 net/ipv4/tcp_ipv4.c                           | 410 ++++++++++++++----
 tools/testing/selftests/bpf/network_helpers.c |  85 +++-
 tools/testing/selftests/bpf/network_helpers.h |   4 +
 .../bpf/prog_tests/bpf_iter_setsockopt.c      | 226 ++++++++++
 .../selftests/bpf/progs/bpf_iter_setsockopt.c |  72 +++
 .../selftests/bpf/progs/bpf_tracing_net.h     |   6 +
 12 files changed, 784 insertions(+), 97 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_iter_setsockopt.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_setsockopt.c

Comments

Martin KaFai Lau July 6, 2021, 3:44 p.m. UTC | #1
On Fri, Jul 02, 2021 at 10:50:43AM +0000, David Laight wrote:
> From: Martin KaFai Lau

> > Sent: 01 July 2021 21:06

> > 

> > This set is to allow bpf tcp iter to call bpf_(get|set)sockopt.

> 

> How does that work at all?

> 

> IIRC only setsockopt() was converted so that it is callable

> with a kernel buffer.

> The corresponding change wasn't done to getsockopt().

It calls _bpf_getsockopt which does not depend on sys_getsockopt.
Alexei Starovoitov July 15, 2021, 1:29 a.m. UTC | #2
On Thu, Jul 1, 2021 at 1:05 PM Martin KaFai Lau <kafai@fb.com> wrote:
>

> This set is to allow bpf tcp iter to call bpf_(get|set)sockopt.

>

> With bpf-tcp-cc, new algo rollout happens more often.  Instead of

> restarting the applications to pick up the new tcp-cc, this set

> allows the bpf tcp iter to call bpf_(get|set)sockopt(TCP_CONGESTION).

> It is not limited to TCP_CONGESTION, the bpf tcp iter can call

> bpf_(get|set)sockopt() with other options.  The bpf tcp iter can read

> into all the fields of a tcp_sock, so there is a lot of flexibility

> to select the desired sk to do setsockopt(), e.g. it can test for

> TCP_LISTEN only and leave the established connections untouched,

> or check the addr/port, or check the current tcp-cc name, ...etc.

>

> Patch 1-4 are some cleanup and prep work in the tcp and bpf seq_file.

>

> Patch 5 is to have the tcp seq_file iterate on the

> port+addr lhash2 instead of the port only listening_hash.

...
>  include/linux/bpf.h                           |   8 +

>  include/net/inet_hashtables.h                 |   6 +

>  include/net/tcp.h                             |   1 -

>  kernel/bpf/bpf_iter.c                         |  22 +

>  kernel/trace/bpf_trace.c                      |   7 +-

>  net/core/filter.c                             |  34 ++

>  net/ipv4/tcp_ipv4.c                           | 410 ++++++++++++++----


Eric,

Could you please review this set where it touches inet bits?
I've looked a few times and it all looks fine to me, but I'm no expert
in those parts.
Alexei Starovoitov July 20, 2021, 6:05 p.m. UTC | #3
On Wed, Jul 14, 2021 at 6:29 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Thu, Jul 1, 2021 at 1:05 PM Martin KaFai Lau <kafai@fb.com> wrote:

> >

> > This set is to allow bpf tcp iter to call bpf_(get|set)sockopt.

> >

> > With bpf-tcp-cc, new algo rollout happens more often.  Instead of

> > restarting the applications to pick up the new tcp-cc, this set

> > allows the bpf tcp iter to call bpf_(get|set)sockopt(TCP_CONGESTION).

> > It is not limited to TCP_CONGESTION, the bpf tcp iter can call

> > bpf_(get|set)sockopt() with other options.  The bpf tcp iter can read

> > into all the fields of a tcp_sock, so there is a lot of flexibility

> > to select the desired sk to do setsockopt(), e.g. it can test for

> > TCP_LISTEN only and leave the established connections untouched,

> > or check the addr/port, or check the current tcp-cc name, ...etc.

> >

> > Patch 1-4 are some cleanup and prep work in the tcp and bpf seq_file.

> >

> > Patch 5 is to have the tcp seq_file iterate on the

> > port+addr lhash2 instead of the port only listening_hash.

> ...

> >  include/linux/bpf.h                           |   8 +

> >  include/net/inet_hashtables.h                 |   6 +

> >  include/net/tcp.h                             |   1 -

> >  kernel/bpf/bpf_iter.c                         |  22 +

> >  kernel/trace/bpf_trace.c                      |   7 +-

> >  net/core/filter.c                             |  34 ++

> >  net/ipv4/tcp_ipv4.c                           | 410 ++++++++++++++----

>

> Eric,

>

> Could you please review this set where it touches inet bits?

> I've looked a few times and it all looks fine to me, but I'm no expert

> in those parts.


Eric,

ping!
If you're on vacation or something I'm inclined to land the patches
and let Martin address your review feedback in follow up patches.

Thanks
Eric Dumazet July 20, 2021, 6:42 p.m. UTC | #4
Hi there.

I was indeed on vacation, but I am back, and done with my netdev presentation :)

I will take a look, thanks !

On Tue, Jul 20, 2021 at 8:05 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> On Wed, Jul 14, 2021 at 6:29 PM Alexei Starovoitov

> <alexei.starovoitov@gmail.com> wrote:

> >

> > On Thu, Jul 1, 2021 at 1:05 PM Martin KaFai Lau <kafai@fb.com> wrote:

> > >

> > > This set is to allow bpf tcp iter to call bpf_(get|set)sockopt.

> > >

> > > With bpf-tcp-cc, new algo rollout happens more often.  Instead of

> > > restarting the applications to pick up the new tcp-cc, this set

> > > allows the bpf tcp iter to call bpf_(get|set)sockopt(TCP_CONGESTION).

> > > It is not limited to TCP_CONGESTION, the bpf tcp iter can call

> > > bpf_(get|set)sockopt() with other options.  The bpf tcp iter can read

> > > into all the fields of a tcp_sock, so there is a lot of flexibility

> > > to select the desired sk to do setsockopt(), e.g. it can test for

> > > TCP_LISTEN only and leave the established connections untouched,

> > > or check the addr/port, or check the current tcp-cc name, ...etc.

> > >

> > > Patch 1-4 are some cleanup and prep work in the tcp and bpf seq_file.

> > >

> > > Patch 5 is to have the tcp seq_file iterate on the

> > > port+addr lhash2 instead of the port only listening_hash.

> > ...

> > >  include/linux/bpf.h                           |   8 +

> > >  include/net/inet_hashtables.h                 |   6 +

> > >  include/net/tcp.h                             |   1 -

> > >  kernel/bpf/bpf_iter.c                         |  22 +

> > >  kernel/trace/bpf_trace.c                      |   7 +-

> > >  net/core/filter.c                             |  34 ++

> > >  net/ipv4/tcp_ipv4.c                           | 410 ++++++++++++++----

> >

> > Eric,

> >

> > Could you please review this set where it touches inet bits?

> > I've looked a few times and it all looks fine to me, but I'm no expert

> > in those parts.

>

> Eric,

>

> ping!

> If you're on vacation or something I'm inclined to land the patches

> and let Martin address your review feedback in follow up patches.

>

> Thanks
Eric Dumazet July 22, 2021, 1:25 p.m. UTC | #5
On 7/1/21 10:05 PM, Martin KaFai Lau wrote:
> This set is to allow bpf tcp iter to call bpf_(get|set)sockopt.

> 

> With bpf-tcp-cc, new algo rollout happens more often.  Instead of

> restarting the applications to pick up the new tcp-cc, this set

> allows the bpf tcp iter to call bpf_(get|set)sockopt(TCP_CONGESTION).

> It is not limited to TCP_CONGESTION, the bpf tcp iter can call

> bpf_(get|set)sockopt() with other options.  The bpf tcp iter can read

> into all the fields of a tcp_sock, so there is a lot of flexibility

> to select the desired sk to do setsockopt(), e.g. it can test for

> TCP_LISTEN only and leave the established connections untouched,

> or check the addr/port, or check the current tcp-cc name, ...etc.

> 

> Patch 1-4 are some cleanup and prep work in the tcp and bpf seq_file.

> 

> Patch 5 is to have the tcp seq_file iterate on the

> port+addr lhash2 instead of the port only listening_hash.

> 

> Patch 6 is to have the bpf tcp iter doing batching which

> then allows lock_sock.  lock_sock is needed for setsockopt.

> 

> Patch 7 allows the bpf tcp iter to call bpf_(get|set)sockopt.

> 

> v2:

> - Use __GFP_NOWARN in patch 6

> - Add bpf_getsockopt() in patch 7 to give a symmetrical user experience.

>   selftest in patch 8 is changed to also cover bpf_getsockopt().

> - Remove CAP_NET_ADMIN check in patch 7. Tracing bpf prog has already

>   required CAP_SYS_ADMIN or CAP_PERFMON.

> - Move some def macros to bpf_tracing_net.h in patch 8

> 

> Martin KaFai Lau (8):

>   tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos

>   tcp: seq_file: Refactor net and family matching

>   bpf: tcp: seq_file: Remove bpf_seq_afinfo from tcp_iter_state

>   tcp: seq_file: Add listening_get_first()

>   tcp: seq_file: Replace listening_hash with lhash2

>   bpf: tcp: bpf iter batching and lock_sock

>   bpf: tcp: Support bpf_(get|set)sockopt in bpf tcp iter

>   bpf: selftest: Test batching and bpf_(get|set)sockopt in bpf tcp iter


For the whole series :

Reviewed-by: Eric Dumazet <edumazet@google.com>


Sorry for the delay.

BTW, it seems weird for new BPF features to use /proc/net "legacy"
infrastructure and update it.
Kuniyuki Iwashima July 22, 2021, 2:53 p.m. UTC | #6
From:   Martin KaFai Lau <kafai@fb.com>

Date:   Thu, 1 Jul 2021 13:05:35 -0700
> This set is to allow bpf tcp iter to call bpf_(get|set)sockopt.

> 

> With bpf-tcp-cc, new algo rollout happens more often.  Instead of

> restarting the applications to pick up the new tcp-cc, this set

> allows the bpf tcp iter to call bpf_(get|set)sockopt(TCP_CONGESTION).

> It is not limited to TCP_CONGESTION, the bpf tcp iter can call

> bpf_(get|set)sockopt() with other options.  The bpf tcp iter can read

> into all the fields of a tcp_sock, so there is a lot of flexibility

> to select the desired sk to do setsockopt(), e.g. it can test for

> TCP_LISTEN only and leave the established connections untouched,

> or check the addr/port, or check the current tcp-cc name, ...etc.

> 

> Patch 1-4 are some cleanup and prep work in the tcp and bpf seq_file.

> 

> Patch 5 is to have the tcp seq_file iterate on the

> port+addr lhash2 instead of the port only listening_hash.

> 

> Patch 6 is to have the bpf tcp iter doing batching which

> then allows lock_sock.  lock_sock is needed for setsockopt.

> 

> Patch 7 allows the bpf tcp iter to call bpf_(get|set)sockopt.


I have a comment on the first patch, but the series looks good to me.

Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Martin KaFai Lau July 22, 2021, 9:01 p.m. UTC | #7
On Thu, Jul 22, 2021 at 03:25:39PM +0200, Eric Dumazet wrote:
> 

> 

> On 7/1/21 10:05 PM, Martin KaFai Lau wrote:

> > This set is to allow bpf tcp iter to call bpf_(get|set)sockopt.

> > 

> > With bpf-tcp-cc, new algo rollout happens more often.  Instead of

> > restarting the applications to pick up the new tcp-cc, this set

> > allows the bpf tcp iter to call bpf_(get|set)sockopt(TCP_CONGESTION).

> > It is not limited to TCP_CONGESTION, the bpf tcp iter can call

> > bpf_(get|set)sockopt() with other options.  The bpf tcp iter can read

> > into all the fields of a tcp_sock, so there is a lot of flexibility

> > to select the desired sk to do setsockopt(), e.g. it can test for

> > TCP_LISTEN only and leave the established connections untouched,

> > or check the addr/port, or check the current tcp-cc name, ...etc.

> > 

> > Patch 1-4 are some cleanup and prep work in the tcp and bpf seq_file.

> > 

> > Patch 5 is to have the tcp seq_file iterate on the

> > port+addr lhash2 instead of the port only listening_hash.

> > 

> > Patch 6 is to have the bpf tcp iter doing batching which

> > then allows lock_sock.  lock_sock is needed for setsockopt.

> > 

> > Patch 7 allows the bpf tcp iter to call bpf_(get|set)sockopt.

> > 

> > v2:

> > - Use __GFP_NOWARN in patch 6

> > - Add bpf_getsockopt() in patch 7 to give a symmetrical user experience.

> >   selftest in patch 8 is changed to also cover bpf_getsockopt().

> > - Remove CAP_NET_ADMIN check in patch 7. Tracing bpf prog has already

> >   required CAP_SYS_ADMIN or CAP_PERFMON.

> > - Move some def macros to bpf_tracing_net.h in patch 8

> > 

> > Martin KaFai Lau (8):

> >   tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos

> >   tcp: seq_file: Refactor net and family matching

> >   bpf: tcp: seq_file: Remove bpf_seq_afinfo from tcp_iter_state

> >   tcp: seq_file: Add listening_get_first()

> >   tcp: seq_file: Replace listening_hash with lhash2

> >   bpf: tcp: bpf iter batching and lock_sock

> >   bpf: tcp: Support bpf_(get|set)sockopt in bpf tcp iter

> >   bpf: selftest: Test batching and bpf_(get|set)sockopt in bpf tcp iter

> 

> For the whole series :

> 

> Reviewed-by: Eric Dumazet <edumazet@google.com>

> 

> Sorry for the delay.

> 

> BTW, it seems weird for new BPF features to use /proc/net "legacy"

> infrastructure and update it.

bpf iter uses seq_file, so the initial bpf_iter_tcp reuses most
of the pieces from /proc/net/tcp.

This set refactored a few things such that the bpf_iter_tcp only
shares the legacy tcp_seek_last_pos(), so the dependency on
/proc/net/tcp should be less going forward.

A similar modification could also be done to bpf_iter_udp in the future.

Thanks for the review!