mbox series

[v5,bpf-next,0/9] introduce support for XDP programs in CPUMAP

Message ID cover.1593521029.git.lorenzo@kernel.org
Headers show
Series introduce support for XDP programs in CPUMAP | expand

Message

Lorenzo Bianconi June 30, 2020, 12:49 p.m. UTC
Similar to what David Ahern proposed in [1] for DEVMAPs, introduce the
capability to attach and run a XDP program to CPUMAP entries.
The idea behind this feature is to add the possibility to define on which CPU
run the eBPF program if the underlying hw does not support RSS.
I respin patch 1/6 from a previous series sent by David [2].
The functionality has been tested on Marvell Espressobin, i40e and mlx5.
Detailed tests results can be found here:
https://github.com/xdp-project/xdp-project/blob/master/areas/cpumap/cpumap04-map-xdp-prog.org

Changes since v4:
- move xdp_clear_return_frame_no_direct inside rcu section
- update David Ahern's email address

Changes since v3:
- fix typo in commit message
- fix access to ctx->ingress_ifindex in cpumap bpf selftest

Changes since v2:
- improved comments
- fix return value in xdp_convert_buff_to_frame
- added patch 1/9: "cpumap: use non-locked version __ptr_ring_consume_batched"
- do not run kmem_cache_alloc_bulk if all frames have been consumed by the XDP
  program attached to the CPUMAP entry
- removed bpf_trace_printk in kselftest

Changes since v1:
- added performance test results
- added kselftest support
- fixed memory accounting with page_pool
- extended xdp_redirect_cpu_user.c to load an external program to perform
  redirect
- reported ifindex to attached eBPF program
- moved bpf_cpumap_val definition to include/uapi/linux/bpf.h

[1] https://patchwork.ozlabs.org/project/netdev/cover/20200529220716.75383-1-dsahern@kernel.org/
[2] https://patchwork.ozlabs.org/project/netdev/patch/20200513014607.40418-2-dsahern@kernel.org/

David Ahern (1):
  net: refactor xdp_convert_buff_to_frame

Jesper Dangaard Brouer (1):
  cpumap: use non-locked version __ptr_ring_consume_batched

Lorenzo Bianconi (7):
  samples/bpf: xdp_redirect_cpu_user: do not update bpf maps in option
    loop
  cpumap: formalize map value as a named struct
  bpf: cpumap: add the possibility to attach an eBPF program to cpumap
  bpf: cpumap: implement XDP_REDIRECT for eBPF programs attached to map
    entries
  libbpf: add SEC name for xdp programs attached to CPUMAP
  samples/bpf: xdp_redirect_cpu: load a eBPF program on cpumap
  selftest: add tests for XDP programs in CPUMAP entries

 include/linux/bpf.h                           |   6 +
 include/net/xdp.h                             |  41 ++--
 include/trace/events/xdp.h                    |  16 +-
 include/uapi/linux/bpf.h                      |  14 ++
 kernel/bpf/cpumap.c                           | 161 +++++++++++---
 net/core/dev.c                                |   9 +
 samples/bpf/xdp_redirect_cpu_kern.c           |  25 ++-
 samples/bpf/xdp_redirect_cpu_user.c           | 209 ++++++++++++++++--
 tools/include/uapi/linux/bpf.h                |  14 ++
 tools/lib/bpf/libbpf.c                        |   2 +
 .../bpf/prog_tests/xdp_cpumap_attach.c        |  70 ++++++
 .../bpf/progs/test_xdp_with_cpumap_helpers.c  |  36 +++
 12 files changed, 531 insertions(+), 72 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_cpumap_attach.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_with_cpumap_helpers.c

Comments

Lorenzo Bianconi July 3, 2020, 3:46 p.m. UTC | #1
> On 6/30/20 2:49 PM, Lorenzo Bianconi wrote:

> [...]


[...]

> > +	prog = READ_ONCE(rcpu->prog);

> 

> What purpose does the READ_ONCE() have here, also given you don't use it in above check?

> Since upon map update you realloc, repopulate and then xchg() the rcpu entry itself, there

> is never the case where you xchg() or WRITE_ONCE() the rcpu->prog, so what does READ_ONCE()

> serve in this context? Imho, it should probably just be deleted and plain rcpu->prog used

> to avoid confusion.


Hi Daniel,

ack, I will fix it in v6

> 

> > +	for (i = 0; i < n; i++) {

> > +		struct xdp_frame *xdpf = frames[i];

> > +		u32 act;

> > +		int err;

> > +

> > +		rxq.dev = xdpf->dev_rx;

> > +		rxq.mem = xdpf->mem;

> > +		/* TODO: report queue_index to xdp_rxq_info */

> > +

> > +		xdp_convert_frame_to_buff(xdpf, &xdp);

> > +

> > +		act = bpf_prog_run_xdp(prog, &xdp);

> > +		switch (act) {

> > +		case XDP_PASS:

> > +			err = xdp_update_frame_from_buff(&xdp, xdpf);

> > +			if (err < 0) {

> > +				xdp_return_frame(xdpf);

> > +				stats->drop++;

> > +			} else {

> > +				frames[nframes++] = xdpf;

> > +				stats->pass++;

> > +			}

> > +			break;

> > +		default:

> > +			bpf_warn_invalid_xdp_action(act);

> > +			/* fallthrough */

> > +		case XDP_DROP:

> > +			xdp_return_frame(xdpf);

> > +			stats->drop++;

> > +			break;

> > +		}

> > +	}

> > +

> > +	xdp_clear_return_frame_no_direct();

> > +

> > +	rcu_read_unlock();

> > +

> > +	return nframes;

> > +}

> [...]

> > +bool cpu_map_prog_allowed(struct bpf_map *map)

> > +{

> > +	return map->map_type == BPF_MAP_TYPE_CPUMAP &&

> > +	       map->value_size != offsetofend(struct bpf_cpumap_val, qsize);

> > +}

> > +

> > +static int __cpu_map_load_bpf_program(struct bpf_cpu_map_entry *rcpu, int fd)

> > +{

> > +	struct bpf_prog *prog;

> > +

> > +	prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP, false);

> 

> Nit: why the _dev variant; just use bpf_prog_get_type()?


ack, I will fix in v6

Regards,
Lorenzo

> 

> > +	if (IS_ERR(prog))

> > +		return PTR_ERR(prog);

> > +

> > +	if (prog->expected_attach_type != BPF_XDP_CPUMAP) {

> > +		bpf_prog_put(prog);

> > +		return -EINVAL;

> > +	}

> > +

> > +	rcpu->value.bpf_prog.id = prog->aux->id;

> > +	rcpu->prog = prog;

> > +

> > +	return 0;

> > +}

> > +
Lorenzo Bianconi July 3, 2020, 8:48 p.m. UTC | #2
> On 6/30/20 2:49 PM, Lorenzo Bianconi wrote:

> [...]


[...]

> >   	old_rcpu = xchg(&cmap->cpu_map[key_cpu], rcpu);

> >   	if (old_rcpu) {

> > +		if (old_rcpu->prog)

> > +			bpf_prog_put(old_rcpu->prog);

> >   		call_rcu(&old_rcpu->rcu, __cpu_map_entry_free);

> >   		INIT_WORK(&old_rcpu->kthread_stop_wq, cpu_map_kthread_stop);

> >   		schedule_work(&old_rcpu->kthread_stop_wq);

> 

> Hm, not quite sure I follow the logic here. Why is the bpf_prog_put() not placed inside

> __cpu_map_entry_free(), for example? Wouldn't this at least leave a potential small race

> window of UAF given the rest is still live? If we already piggy-back from RCU side on

> rcpu entry, why not having it in __cpu_map_entry_free()?


ack right, thanks for spotting this issue. I guess we can even move
"bpf_prog_put(rcpu->prog)" in put_cpu_map_entry() so the last consumer
of bpf_cpu_map_entry will free the attached program. Agree?

Regards,
Lorenzo

> 

> Thanks,

> Daniel
Daniel Borkmann July 3, 2020, 10:24 p.m. UTC | #3
On 7/3/20 10:48 PM, Lorenzo Bianconi wrote:
>> On 6/30/20 2:49 PM, Lorenzo Bianconi wrote:

>> [...]

> 

> [...]

> 

>>>    	old_rcpu = xchg(&cmap->cpu_map[key_cpu], rcpu);

>>>    	if (old_rcpu) {

>>> +		if (old_rcpu->prog)

>>> +			bpf_prog_put(old_rcpu->prog);

>>>    		call_rcu(&old_rcpu->rcu, __cpu_map_entry_free);

>>>    		INIT_WORK(&old_rcpu->kthread_stop_wq, cpu_map_kthread_stop);

>>>    		schedule_work(&old_rcpu->kthread_stop_wq);

>>

>> Hm, not quite sure I follow the logic here. Why is the bpf_prog_put() not placed inside

>> __cpu_map_entry_free(), for example? Wouldn't this at least leave a potential small race

>> window of UAF given the rest is still live? If we already piggy-back from RCU side on

>> rcpu entry, why not having it in __cpu_map_entry_free()?

> 

> ack right, thanks for spotting this issue. I guess we can even move

> "bpf_prog_put(rcpu->prog)" in put_cpu_map_entry() so the last consumer

> of bpf_cpu_map_entry will free the attached program. Agree?


Yes, that sounds reasonable to me.

Thanks,
Daniel