mbox series

[RFC,net-next,0/6] multi release pacing for UDP GSO

Message ID 20200609140934.110785-1-willemdebruijn.kernel@gmail.com
Headers show
Series multi release pacing for UDP GSO | expand

Message

Willem de Bruijn June 9, 2020, 2:09 p.m. UTC
From: Willem de Bruijn <willemb@google.com>

UDP segmentation offload with UDP_SEGMENT can significantly reduce the
transmission cycle cost per byte for protocols like QUIC.

Pacing offload with SO_TXTIME can improve accuracy and cycle cost of
pacing for such userspace protocols further.

But the maximum GSO size built is limited by the pacing rate. As msec
pacing interval, for many Internet clients results in at most a few
segments per datagram.

The pros and cons were captured in a recent CloudFlare article,
specifically mentioning

  "But it does not yet support specifying different times for each
  packet when GSO is used, as there is no way to define multiple
  timestamps for packets that need to be segmented (each segmented
  packet essentially ends up being sent at the same time anyway)."

  https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/

We have been evaluating such a mechanism for multiple release times
per UDP GSO packets. Since it sounds like it may of interest to
others, too, it may be a while before we have all the data I'd like
and it's more quiet on the list now that the merge window is open,
sharing a WIP version.

The basic approach is to specify

1. initial early release time (in nsec)
2. interval between subsequent release times (in msec)
3. number of segments to release at each release time

One implementation concern is where to store the additional two fields
in the skb. Given that msec granularity is the Internet pacing speed,
for now repurpose the two lowest 4B nibbles in skb->tstamp to hold the
interval and segment count. I'm aware that this does not win a prize
for elegance.

Patch 1 adds the socket option and basic segmentation function to
  adjust the skb->tstamp of the individual segments.

Patch 2 extends this with support for build GSO segs. Build one GSO
   segment per interval if the hardware can offload (USO) and thus
   we are segmenting only to maintain pacing rate.

Patch 3 wires the segmentation up to the FQ qdisc on enqueue, so that
   segments will be scheduled for delivery at their adjusted time.

Patch 4..6 extend existing tests to experiment with the feature

Patch 4 allows testing so_txtime across hardware (for USO)
Patch 5 extends the so_txtime test with support for gso and mr-pacing
Patch 6 extends the udpgso bench to support pacing and mr-pacing

Some known limitations:

- the aforementioned storage in skb->tstamp.

- exposing this constraint through the SO_TXTIME interface.
  it is cleaner to add new fields to the cmsg, at nsec resolution.

- the fq_enqueue path adds a branch to the hot path.
  a static branch would avoid that.

- a few udp specific assumptions in a net/core datapath.
  notably the hw_features. this can be derived from gso_type.

Willem de Bruijn (6):
  net: multiple release time SO_TXTIME
  net: build gso segs in multi release time SO_TXTIME
  net_sched: sch_fq: multiple release time support
  selftests/net: so_txtime: support txonly/rxonly modes
  selftests/net: so_txtime: add gso and multi release pacing
  selftests/net: upgso bench: add pacing with SO_TXTIME

 include/linux/netdevice.h                     |   1 +
 include/net/sock.h                            |   3 +-
 include/uapi/linux/net_tstamp.h               |   3 +-
 net/core/dev.c                                |  71 +++++++++
 net/core/sock.c                               |   4 +
 net/sched/sch_fq.c                            |  33 ++++-
 tools/testing/selftests/net/so_txtime.c       | 136 ++++++++++++++----
 tools/testing/selftests/net/so_txtime.sh      |   7 +
 .../testing/selftests/net/so_txtime_multi.sh  |  68 +++++++++
 .../selftests/net/udpgso_bench_multi.sh       |  65 +++++++++
 tools/testing/selftests/net/udpgso_bench_tx.c |  72 +++++++++-
 11 files changed, 431 insertions(+), 32 deletions(-)
 create mode 100755 tools/testing/selftests/net/so_txtime_multi.sh
 create mode 100755 tools/testing/selftests/net/udpgso_bench_multi.sh