mbox series

[net-next,v9,00/10] net: openvswitch: Add sample multicasting.

Message ID 20240704085710.353845-1-amorenoz@redhat.com
Headers show
Series net: openvswitch: Add sample multicasting. | expand

Message

Adrian Moreno July 4, 2024, 8:56 a.m. UTC
** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.

A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.

** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.

Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.

Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.

** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.

The main existing user of psample is tc's act_sample. It is also
extended to forward the action's cookie to psample.

Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created.
It accepts a group and an optional cookie and uses psample to
multicast the packet and the metadata.

--
v8 -> v9:
- Rebased.

v7 -> v8:
- Rebased
- Redirect flow insertion to /dev/null to avoid spat in test.
- Removed inline keyword in stub execute_psample_action function.

v6 -> v7:
- Rebased
- Fixed typo in comment.

v5 -> v6:
- Renamed emit_sample -> psample
- Addressed unused variable and conditionally compilation of function.

v4 -> v5:
- Rebased.
- Removed lefover enum value and wrapped some long lines in selftests.

v3 -> v4:
- Rebased.
- Addressed Jakub's comment on private and unused nla attributes.

v2 -> v3:
- Addressed comments from Simon, Aaron and Ilya.
- Dropped probability propagation in nested sample actions.
- Dropped patch v2's 7/9 in favor of a userspace implementation and
consume skb if emit_sample is the last action, same as we do with
userspace.
- Split ovs-dpctl.py features in independent patches.

v1 -> v2:
- Create a new action ("emit_sample") rather than reuse existing
  "sample" one.
- Add probability semantics to psample's sampling rate.
- Store sampling probability in skb's cb area and use it in emit_sample.
- Test combining "emit_sample" with "trunc"
- Drop group_id filtering and tracepoint in psample.

rfc_v2 -> v1:
- Accommodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.

rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.

Adrian Moreno (10):
  net: psample: add user cookie
  net: sched: act_sample: add action cookie to sample
  net: psample: skip packet copy if no listeners
  net: psample: allow using rate as probability
  net: openvswitch: add psample action
  net: openvswitch: store sampling probability in cb.
  selftests: openvswitch: add psample action
  selftests: openvswitch: add userspace parsing
  selftests: openvswitch: parse trunc action
  selftests: openvswitch: add psample test

 Documentation/netlink/specs/ovs_flow.yaml     |  17 ++
 include/net/psample.h                         |   5 +-
 include/uapi/linux/openvswitch.h              |  31 +-
 include/uapi/linux/psample.h                  |  11 +-
 net/openvswitch/Kconfig                       |   1 +
 net/openvswitch/actions.c                     |  66 ++++-
 net/openvswitch/datapath.h                    |   3 +
 net/openvswitch/flow_netlink.c                |  32 ++-
 net/openvswitch/vport.c                       |   1 +
 net/psample/psample.c                         |  16 +-
 net/sched/act_sample.c                        |  12 +
 .../selftests/net/openvswitch/openvswitch.sh  | 115 +++++++-
 .../selftests/net/openvswitch/ovs-dpctl.py    | 272 +++++++++++++++++-
 13 files changed, 566 insertions(+), 16 deletions(-)

Comments

Adrian Moreno July 5, 2024, 9:49 a.m. UTC | #1
On Thu, Jul 04, 2024 at 10:56:51AM GMT, Adrian Moreno wrote:
> ** Background **
> Currently, OVS supports several packet sampling mechanisms (sFlow,
> per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
> userspace action that needs to be handled by ovs-vswitchd's handler
> threads only to be forwarded to some third party application that
> will somehow process the sample and provide observability on the
> datapath.
>
> A particularly interesting use-case is controller-driven
> per-flow IPFIX sampling where the OpenFlow controller can add metadata
> to samples (via two 32bit integers) and this metadata is then available
> to the sample-collecting system for correlation.
>
> ** Problem **
> The fact that sampled traffic share netlink sockets and handler thread
> time with upcalls, apart from being a performance bottleneck in the
> sample extraction itself, can severely compromise the datapath,
> yielding this solution unfit for highly loaded production systems.
>
> Users are left with little options other than guessing what sampling
> rate will be OK for their traffic pattern and system load and dealing
> with the lost accuracy.
>
> Looking at available infrastructure, an obvious candidated would be
> to use psample. However, it's current state does not help with the
> use-case at stake because sampled packets do not contain user-defined
> metadata.
>
> ** Proposal **
> This series is an attempt to fix this situation by extending the
> existing psample infrastructure to carry a variable length
> user-defined cookie.
>
> The main existing user of psample is tc's act_sample. It is also
> extended to forward the action's cookie to psample.
>
> Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created.
> It accepts a group and an optional cookie and uses psample to
> multicast the packet and the metadata.
>
> --
> v8 -> v9:
> - Rebased.
>
> v7 -> v8:
> - Rebased
> - Redirect flow insertion to /dev/null to avoid spat in test.
> - Removed inline keyword in stub execute_psample_action function.
>
> v6 -> v7:
> - Rebased
> - Fixed typo in comment.
>
> v5 -> v6:
> - Renamed emit_sample -> psample
> - Addressed unused variable and conditionally compilation of function.
>
> v4 -> v5:
> - Rebased.
> - Removed lefover enum value and wrapped some long lines in selftests.
>
> v3 -> v4:
> - Rebased.
> - Addressed Jakub's comment on private and unused nla attributes.
>
> v2 -> v3:
> - Addressed comments from Simon, Aaron and Ilya.
> - Dropped probability propagation in nested sample actions.
> - Dropped patch v2's 7/9 in favor of a userspace implementation and
> consume skb if emit_sample is the last action, same as we do with
> userspace.
> - Split ovs-dpctl.py features in independent patches.
>
> v1 -> v2:
> - Create a new action ("emit_sample") rather than reuse existing
>   "sample" one.
> - Add probability semantics to psample's sampling rate.
> - Store sampling probability in skb's cb area and use it in emit_sample.
> - Test combining "emit_sample" with "trunc"
> - Drop group_id filtering and tracepoint in psample.
>
> rfc_v2 -> v1:
> - Accommodate Ilya's comments.
> - Split OVS's attribute in two attributes and simplify internal
> handling of psample arguments.
> - Extend psample and tc with a user-defined cookie.
> - Add a tracepoint to psample to facilitate troubleshooting.
>
> rfc_v1 -> rfc_v2:
> - Use psample instead of a new OVS-only multicast group.
> - Extend psample and tc with a user-defined cookie.
>
> Adrian Moreno (10):
>   net: psample: add user cookie
>   net: sched: act_sample: add action cookie to sample
>   net: psample: skip packet copy if no listeners
>   net: psample: allow using rate as probability
>   net: openvswitch: add psample action
>   net: openvswitch: store sampling probability in cb.
>   selftests: openvswitch: add psample action
>   selftests: openvswitch: add userspace parsing
>   selftests: openvswitch: parse trunc action
>   selftests: openvswitch: add psample test
>
>  Documentation/netlink/specs/ovs_flow.yaml     |  17 ++
>  include/net/psample.h                         |   5 +-
>  include/uapi/linux/openvswitch.h              |  31 +-
>  include/uapi/linux/psample.h                  |  11 +-
>  net/openvswitch/Kconfig                       |   1 +
>  net/openvswitch/actions.c                     |  66 ++++-
>  net/openvswitch/datapath.h                    |   3 +
>  net/openvswitch/flow_netlink.c                |  32 ++-
>  net/openvswitch/vport.c                       |   1 +
>  net/psample/psample.c                         |  16 +-
>  net/sched/act_sample.c                        |  12 +
>  .../selftests/net/openvswitch/openvswitch.sh  | 115 +++++++-
>  .../selftests/net/openvswitch/ovs-dpctl.py    | 272 +++++++++++++++++-
>  13 files changed, 566 insertions(+), 16 deletions(-)
>
> --
> 2.45.2
>

Hi,

Simon Horman has spotted that openvswitch.sh tests are failing in the
debug executor:

https://netdev.bots.linux.dev/contest.html?test=openvswitch-sh

The failing tests are two: psample and upcall_interfaces. These two
tests have a known source of instability (they use "sleep") that make
them specially unreliable in slow systems.

Aaron and I already discussed this and I'm working on a patch to make
both tests more robust by adding a wait-and-retry mechanism.

I hope this series can be considered regardless of this flaky tests.

Thanks.
Adrián
Adrian Moreno July 5, 2024, 1:58 p.m. UTC | #2
On Fri, Jul 05, 2024 at 11:49:28AM GMT, Adrián Moreno wrote:
> On Thu, Jul 04, 2024 at 10:56:51AM GMT, Adrian Moreno wrote:
> > ** Background **
> > Currently, OVS supports several packet sampling mechanisms (sFlow,
> > per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
> > userspace action that needs to be handled by ovs-vswitchd's handler
> > threads only to be forwarded to some third party application that
> > will somehow process the sample and provide observability on the
> > datapath.
> >
> > A particularly interesting use-case is controller-driven
> > per-flow IPFIX sampling where the OpenFlow controller can add metadata
> > to samples (via two 32bit integers) and this metadata is then available
> > to the sample-collecting system for correlation.
> >
> > ** Problem **
> > The fact that sampled traffic share netlink sockets and handler thread
> > time with upcalls, apart from being a performance bottleneck in the
> > sample extraction itself, can severely compromise the datapath,
> > yielding this solution unfit for highly loaded production systems.
> >
> > Users are left with little options other than guessing what sampling
> > rate will be OK for their traffic pattern and system load and dealing
> > with the lost accuracy.
> >
> > Looking at available infrastructure, an obvious candidated would be
> > to use psample. However, it's current state does not help with the
> > use-case at stake because sampled packets do not contain user-defined
> > metadata.
> >
> > ** Proposal **
> > This series is an attempt to fix this situation by extending the
> > existing psample infrastructure to carry a variable length
> > user-defined cookie.
> >
> > The main existing user of psample is tc's act_sample. It is also
> > extended to forward the action's cookie to psample.
> >
> > Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created.
> > It accepts a group and an optional cookie and uses psample to
> > multicast the packet and the metadata.
> >
> > --
> > v8 -> v9:
> > - Rebased.
> >
> > v7 -> v8:
> > - Rebased
> > - Redirect flow insertion to /dev/null to avoid spat in test.
> > - Removed inline keyword in stub execute_psample_action function.
> >
> > v6 -> v7:
> > - Rebased
> > - Fixed typo in comment.
> >
> > v5 -> v6:
> > - Renamed emit_sample -> psample
> > - Addressed unused variable and conditionally compilation of function.
> >
> > v4 -> v5:
> > - Rebased.
> > - Removed lefover enum value and wrapped some long lines in selftests.
> >
> > v3 -> v4:
> > - Rebased.
> > - Addressed Jakub's comment on private and unused nla attributes.
> >
> > v2 -> v3:
> > - Addressed comments from Simon, Aaron and Ilya.
> > - Dropped probability propagation in nested sample actions.
> > - Dropped patch v2's 7/9 in favor of a userspace implementation and
> > consume skb if emit_sample is the last action, same as we do with
> > userspace.
> > - Split ovs-dpctl.py features in independent patches.
> >
> > v1 -> v2:
> > - Create a new action ("emit_sample") rather than reuse existing
> >   "sample" one.
> > - Add probability semantics to psample's sampling rate.
> > - Store sampling probability in skb's cb area and use it in emit_sample.
> > - Test combining "emit_sample" with "trunc"
> > - Drop group_id filtering and tracepoint in psample.
> >
> > rfc_v2 -> v1:
> > - Accommodate Ilya's comments.
> > - Split OVS's attribute in two attributes and simplify internal
> > handling of psample arguments.
> > - Extend psample and tc with a user-defined cookie.
> > - Add a tracepoint to psample to facilitate troubleshooting.
> >
> > rfc_v1 -> rfc_v2:
> > - Use psample instead of a new OVS-only multicast group.
> > - Extend psample and tc with a user-defined cookie.
> >
> > Adrian Moreno (10):
> >   net: psample: add user cookie
> >   net: sched: act_sample: add action cookie to sample
> >   net: psample: skip packet copy if no listeners
> >   net: psample: allow using rate as probability
> >   net: openvswitch: add psample action
> >   net: openvswitch: store sampling probability in cb.
> >   selftests: openvswitch: add psample action
> >   selftests: openvswitch: add userspace parsing
> >   selftests: openvswitch: parse trunc action
> >   selftests: openvswitch: add psample test
> >
> >  Documentation/netlink/specs/ovs_flow.yaml     |  17 ++
> >  include/net/psample.h                         |   5 +-
> >  include/uapi/linux/openvswitch.h              |  31 +-
> >  include/uapi/linux/psample.h                  |  11 +-
> >  net/openvswitch/Kconfig                       |   1 +
> >  net/openvswitch/actions.c                     |  66 ++++-
> >  net/openvswitch/datapath.h                    |   3 +
> >  net/openvswitch/flow_netlink.c                |  32 ++-
> >  net/openvswitch/vport.c                       |   1 +
> >  net/psample/psample.c                         |  16 +-
> >  net/sched/act_sample.c                        |  12 +
> >  .../selftests/net/openvswitch/openvswitch.sh  | 115 +++++++-
> >  .../selftests/net/openvswitch/ovs-dpctl.py    | 272 +++++++++++++++++-
> >  13 files changed, 566 insertions(+), 16 deletions(-)
> >
> > --
> > 2.45.2
> >
>
> Hi,
>
> Simon Horman has spotted that openvswitch.sh tests are failing in the
> debug executor:
>
> https://netdev.bots.linux.dev/contest.html?test=openvswitch-sh
>
> The failing tests are two: psample and upcall_interfaces. These two
> tests have a known source of instability (they use "sleep") that make
> them specially unreliable in slow systems.
>
> Aaron and I already discussed this and I'm working on a patch to make
> both tests more robust by adding a wait-and-retry mechanism.
>
> I hope this series can be considered regardless of this flaky tests.
>

Adding more context to explain our situation.

This series has a counterpart in OVS [1]. The state of this other series
is still RFC just because the kernel bits have not yet been merged.

OVS 3.4 "softfreeze" was declared last monday, which excludes from the
release any series that is stil in RFC state.
Given the kernel parts seemed very close to be merged, an exception was
given to the series so we can consider it for inclusion [2].

I hate to put any pressure on already busy maintainers but I would also
dislike missing this OVS release by just one or two days and having
to wait 6 months (OVS release cadence) for it to be available.

Again, I don't want to put pressure on maintainers. If it's not
possible, that's it. I just wanted to voice our timeline constraints.

Thanks for your understanding.
Adrián

[1] https://patchwork.ozlabs.org/project/openvswitch/cover/20240704085710.353845-1-amorenoz@redhat.com/
[2] https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/415261.html
patchwork-bot+netdevbpf@kernel.org July 6, 2024, 1:10 a.m. UTC | #3
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu,  4 Jul 2024 10:56:51 +0200 you wrote:
> ** Background **
> Currently, OVS supports several packet sampling mechanisms (sFlow,
> per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
> userspace action that needs to be handled by ovs-vswitchd's handler
> threads only to be forwarded to some third party application that
> will somehow process the sample and provide observability on the
> datapath.
> 
> [...]

Here is the summary with links:
  - [net-next,v9,01/10] net: psample: add user cookie
    https://git.kernel.org/netdev/net-next/c/093b0f366567
  - [net-next,v9,02/10] net: sched: act_sample: add action cookie to sample
    https://git.kernel.org/netdev/net-next/c/03448444ae5c
  - [net-next,v9,03/10] net: psample: skip packet copy if no listeners
    https://git.kernel.org/netdev/net-next/c/c35d86a23029
  - [net-next,v9,04/10] net: psample: allow using rate as probability
    https://git.kernel.org/netdev/net-next/c/7b1b2b60c63f
  - [net-next,v9,05/10] net: openvswitch: add psample action
    https://git.kernel.org/netdev/net-next/c/aae0b82b46cb
  - [net-next,v9,06/10] net: openvswitch: store sampling probability in cb.
    https://git.kernel.org/netdev/net-next/c/71763d8a8203
  - [net-next,v9,07/10] selftests: openvswitch: add psample action
    https://git.kernel.org/netdev/net-next/c/60ccf62d3ceb
  - [net-next,v9,08/10] selftests: openvswitch: add userspace parsing
    https://git.kernel.org/netdev/net-next/c/c7815abbea45
  - [net-next,v9,09/10] selftests: openvswitch: parse trunc action
    https://git.kernel.org/netdev/net-next/c/b192bf12dbb0
  - [net-next,v9,10/10] selftests: openvswitch: add psample test
    https://git.kernel.org/netdev/net-next/c/30d772a03582

You are awesome, thank you!