mbox series

[RFC,net-next,0/6] Allow excluding sw flow key from upcalls

Message ID 20221122140307.705112-1-aconole@redhat.com
Headers show
Series Allow excluding sw flow key from upcalls | expand

Message

Aaron Conole Nov. 22, 2022, 2:03 p.m. UTC
Userspace applications can choose to completely ignore the kernel provided
flow key and instead regenerate a fresh key for processing in userspace.
Currently, userspace ovs-vswitchd does this in some instances (for example,
MISS upcall command).  This means that kernel spends time to copy and send
the flow key into userspace without any benefit to the system.

Introduce a way for userspace to tell kernel not to send the flow key.
This lets userspace and kernel space save time and memory pressure.

This patch set is quite a bit larger because it introduces the ability to
decode a sw flow key into a compatible datapath-string.  We use this as a
method of implementing a test to show that the feature is working by
decoding and dumping the flow (to make sure we capture the correct packet).

Aaron Conole (6):
  openvswitch: exclude kernel flow key from upcalls
  selftests: openvswitch: add interface support
  selftests: openvswitch: add flow dump support
  selftests: openvswitch: adjust datapath NL message
  selftests: openvswitch: add upcall support
  selftests: openvswitch: add exclude support for packet commands

 include/uapi/linux/openvswitch.h              |    6 +
 net/openvswitch/datapath.c                    |   26 +-
 net/openvswitch/datapath.h                    |    2 +
 .../selftests/net/openvswitch/openvswitch.sh  |  101 +-
 .../selftests/net/openvswitch/ovs-dpctl.py    | 1069 ++++++++++++++++-
 5 files changed, 1183 insertions(+), 21 deletions(-)

Comments

Ilya Maximets Nov. 23, 2022, 9:22 p.m. UTC | #1
On 11/22/22 15:03, Aaron Conole wrote:
> When processing upcall commands, two groups of data are available to
> userspace for processing: the actual packet data and the kernel
> sw flow key data.  The inclusion of the flow key allows the userspace
> avoid running through the dissection again.
> 
> However, the userspace can choose to ignore the flow key data, as is
> the case in some ovs-vswitchd upcall processing.  For these messages,
> having the flow key data merely adds additional data to the upcall
> pipeline without any actual gain.  Userspace simply throws the data
> away anyway.

Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
packet from scratch and using the newly parsed key for the OpenFlow
translation, the kernel-porvided key is still used in a few important
places.  Mainly for the compatibility checking.  The use is described
here in more details:
  https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility

We need to compare the key generated in userspace with the key
generated by the kernel to know if it's safe to install the new flow
to the kernel, i.e. if the kernel and OVS userpsace are parsing the
packet in the same way.

On the other hand, OVS today doesn't check the data, it only checks
which fields are present.  So, if we can generate and pass the bitmap
of fields present in the key or something similar without sending the
full key, that might still save some CPU cycles and memory in the
socket buffer while preserving the ability to check for forward and
backward compatibility.  What do you think?


The rest of the patch set seems useful even without patch #1 though.

Nit: This patch #1 should probably be merged with the patch #6 and be
at the end of a patch set, so the selftest and the main code are updated
at the same time.

Best regards, Ilya Maximets.
Adrian Moreno Nov. 25, 2022, 3:29 p.m. UTC | #2
On 11/23/22 22:22, Ilya Maximets wrote:
> On 11/22/22 15:03, Aaron Conole wrote:
>> When processing upcall commands, two groups of data are available to
>> userspace for processing: the actual packet data and the kernel
>> sw flow key data.  The inclusion of the flow key allows the userspace
>> avoid running through the dissection again.
>>
>> However, the userspace can choose to ignore the flow key data, as is
>> the case in some ovs-vswitchd upcall processing.  For these messages,
>> having the flow key data merely adds additional data to the upcall
>> pipeline without any actual gain.  Userspace simply throws the data
>> away anyway.
> 
> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
> packet from scratch and using the newly parsed key for the OpenFlow
> translation, the kernel-porvided key is still used in a few important
> places.  Mainly for the compatibility checking.  The use is described
> here in more details:
>    https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
> 
> We need to compare the key generated in userspace with the key
> generated by the kernel to know if it's safe to install the new flow
> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
> packet in the same way.
> 

Hi Ilya,

Do we need to do that for every packet?
Could we send a bitmask of supported fields to userspace at feature negotiation 
and let OVS slowpath flows that it knows the kernel won't be able to handle 
properly?


> On the other hand, OVS today doesn't check the data, it only checks
> which fields are present.  So, if we can generate and pass the bitmap
> of fields present in the key or something similar without sending the
> full key, that might still save some CPU cycles and memory in the
> socket buffer while preserving the ability to check for forward and
> backward compatibility.  What do you think?
> 
> 
> The rest of the patch set seems useful even without patch #1 though.
> 
> Nit: This patch #1 should probably be merged with the patch #6 and be
> at the end of a patch set, so the selftest and the main code are updated
> at the same time.
> 
> Best regards, Ilya Maximets.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

Thanks
Ilya Maximets Nov. 25, 2022, 3:51 p.m. UTC | #3
On 11/25/22 16:29, Adrian Moreno wrote:
> 
> 
> On 11/23/22 22:22, Ilya Maximets wrote:
>> On 11/22/22 15:03, Aaron Conole wrote:
>>> When processing upcall commands, two groups of data are available to
>>> userspace for processing: the actual packet data and the kernel
>>> sw flow key data.  The inclusion of the flow key allows the userspace
>>> avoid running through the dissection again.
>>>
>>> However, the userspace can choose to ignore the flow key data, as is
>>> the case in some ovs-vswitchd upcall processing.  For these messages,
>>> having the flow key data merely adds additional data to the upcall
>>> pipeline without any actual gain.  Userspace simply throws the data
>>> away anyway.
>>
>> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
>> packet from scratch and using the newly parsed key for the OpenFlow
>> translation, the kernel-porvided key is still used in a few important
>> places.  Mainly for the compatibility checking.  The use is described
>> here in more details:
>>    https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
>>
>> We need to compare the key generated in userspace with the key
>> generated by the kernel to know if it's safe to install the new flow
>> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
>> packet in the same way.
>>
> 
> Hi Ilya,
> 
> Do we need to do that for every packet?
> Could we send a bitmask of supported fields to userspace at feature
> negotiation and let OVS slowpath flows that it knows the kernel won't
> be able to handle properly?

It's not that simple, because supported fields in a packet depend
on previous fields in that same packet.  For example, parsing TCP
header is generally supported, but it won't be parsed for IPv6
fragments (even the first one), number of vlan headers will affect
the parsing as we do not parse deeper than 2 vlan headers, etc.
So, I'm afraid we have to have a per-packet information, unless we
can somehow probe all the possible valid combinations of packet
headers.

> 
> 
>> On the other hand, OVS today doesn't check the data, it only checks
>> which fields are present.  So, if we can generate and pass the bitmap
>> of fields present in the key or something similar without sending the
>> full key, that might still save some CPU cycles and memory in the
>> socket buffer while preserving the ability to check for forward and
>> backward compatibility.  What do you think?
>>
>>
>> The rest of the patch set seems useful even without patch #1 though.
>>
>> Nit: This patch #1 should probably be merged with the patch #6 and be
>> at the end of a patch set, so the selftest and the main code are updated
>> at the same time.
>>
>> Best regards, Ilya Maximets.
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 
> Thanks
Adrian Moreno Nov. 28, 2022, 9:12 a.m. UTC | #4
On 11/25/22 16:51, Ilya Maximets wrote:
> On 11/25/22 16:29, Adrian Moreno wrote:
>>
>>
>> On 11/23/22 22:22, Ilya Maximets wrote:
>>> On 11/22/22 15:03, Aaron Conole wrote:
>>>> When processing upcall commands, two groups of data are available to
>>>> userspace for processing: the actual packet data and the kernel
>>>> sw flow key data.  The inclusion of the flow key allows the userspace
>>>> avoid running through the dissection again.
>>>>
>>>> However, the userspace can choose to ignore the flow key data, as is
>>>> the case in some ovs-vswitchd upcall processing.  For these messages,
>>>> having the flow key data merely adds additional data to the upcall
>>>> pipeline without any actual gain.  Userspace simply throws the data
>>>> away anyway.
>>>
>>> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
>>> packet from scratch and using the newly parsed key for the OpenFlow
>>> translation, the kernel-porvided key is still used in a few important
>>> places.  Mainly for the compatibility checking.  The use is described
>>> here in more details:
>>>     https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
>>>
>>> We need to compare the key generated in userspace with the key
>>> generated by the kernel to know if it's safe to install the new flow
>>> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
>>> packet in the same way.
>>>
>>
>> Hi Ilya,
>>
>> Do we need to do that for every packet?
>> Could we send a bitmask of supported fields to userspace at feature
>> negotiation and let OVS slowpath flows that it knows the kernel won't
>> be able to handle properly?
> 
> It's not that simple, because supported fields in a packet depend
> on previous fields in that same packet.  For example, parsing TCP
> header is generally supported, but it won't be parsed for IPv6
> fragments (even the first one), number of vlan headers will affect
> the parsing as we do not parse deeper than 2 vlan headers, etc.
> So, I'm afraid we have to have a per-packet information, unless we
> can somehow probe all the possible valid combinations of packet
> headers.
> 

Surely. I understand that we'd need more than just a bit per field. Things like 
L4 on IPv6 frags would need another bit and the number of VLAN headers would 
need some more. But, are these a handful of exceptions or do we really need all 
the possible combinations of headers? If it's a matter of naming a handful of 
corner cases I think we could consider expressing them at initialization time 
and safe some buffer space plus computation time both in kernel and userspace.
Aaron Conole Nov. 29, 2022, 2:26 p.m. UTC | #5
Adrian Moreno <amorenoz@redhat.com> writes:

> On 11/25/22 16:51, Ilya Maximets wrote:
>> On 11/25/22 16:29, Adrian Moreno wrote:
>>>
>>>
>>> On 11/23/22 22:22, Ilya Maximets wrote:
>>>> On 11/22/22 15:03, Aaron Conole wrote:
>>>>> When processing upcall commands, two groups of data are available to
>>>>> userspace for processing: the actual packet data and the kernel
>>>>> sw flow key data.  The inclusion of the flow key allows the userspace
>>>>> avoid running through the dissection again.
>>>>>
>>>>> However, the userspace can choose to ignore the flow key data, as is
>>>>> the case in some ovs-vswitchd upcall processing.  For these messages,
>>>>> having the flow key data merely adds additional data to the upcall
>>>>> pipeline without any actual gain.  Userspace simply throws the data
>>>>> away anyway.
>>>>
>>>> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
>>>> packet from scratch and using the newly parsed key for the OpenFlow
>>>> translation, the kernel-porvided key is still used in a few important
>>>> places.  Mainly for the compatibility checking.  The use is described
>>>> here in more details:
>>>>     https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
>>>>
>>>> We need to compare the key generated in userspace with the key
>>>> generated by the kernel to know if it's safe to install the new flow
>>>> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
>>>> packet in the same way.
>>>>
>>>
>>> Hi Ilya,
>>>
>>> Do we need to do that for every packet?
>>> Could we send a bitmask of supported fields to userspace at feature
>>> negotiation and let OVS slowpath flows that it knows the kernel won't
>>> be able to handle properly?
>> It's not that simple, because supported fields in a packet depend
>> on previous fields in that same packet.  For example, parsing TCP
>> header is generally supported, but it won't be parsed for IPv6
>> fragments (even the first one), number of vlan headers will affect
>> the parsing as we do not parse deeper than 2 vlan headers, etc.
>> So, I'm afraid we have to have a per-packet information, unless we
>> can somehow probe all the possible valid combinations of packet
>> headers.
>> 
>
> Surely. I understand that we'd need more than just a bit per
> field. Things like L4 on IPv6 frags would need another bit and the
> number of VLAN headers would need some more. But, are these a handful
> of exceptions or do we really need all the possible combinations of
> headers? If it's a matter of naming a handful of corner cases I think
> we could consider expressing them at initialization time and safe some
> buffer space plus computation time both in kernel and userspace.

I will take a bit more of a look here - there must surely be a way to
express this when pulling information via DP_GET command so that we
don't need to wait for a packet to come in to figure out whether we can
parse it.
Aaron Conole Nov. 29, 2022, 2:30 p.m. UTC | #6
Ilya Maximets <i.maximets@ovn.org> writes:

> On 11/22/22 15:03, Aaron Conole wrote:
>> When processing upcall commands, two groups of data are available to
>> userspace for processing: the actual packet data and the kernel
>> sw flow key data.  The inclusion of the flow key allows the userspace
>> avoid running through the dissection again.
>> 
>> However, the userspace can choose to ignore the flow key data, as is
>> the case in some ovs-vswitchd upcall processing.  For these messages,
>> having the flow key data merely adds additional data to the upcall
>> pipeline without any actual gain.  Userspace simply throws the data
>> away anyway.
>
> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
> packet from scratch and using the newly parsed key for the OpenFlow
> translation, the kernel-porvided key is still used in a few important
> places.  Mainly for the compatibility checking.  The use is described
> here in more details:
>   https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
>
> We need to compare the key generated in userspace with the key
> generated by the kernel to know if it's safe to install the new flow
> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
> packet in the same way.
>
> On the other hand, OVS today doesn't check the data, it only checks
> which fields are present.  So, if we can generate and pass the bitmap
> of fields present in the key or something similar without sending the
> full key, that might still save some CPU cycles and memory in the
> socket buffer while preserving the ability to check for forward and
> backward compatibility.  What do you think?

Maybe that can work.  I will try testing.  If so, then I would change
this semantic to send just the bitmap rather than omitting everything.

> The rest of the patch set seems useful even without patch #1 though.

I agree - but I didn't know if it made sense to submit the series
without adding something impactful (like a test).  I will work a bit
more on the flow area - maybe I can add enough actions and matches to
implement basic flow tests to submit while we think more about the feature.

> Nit: This patch #1 should probably be merged with the patch #6 and be
> at the end of a patch set, so the selftest and the main code are updated
> at the same time.

Okay - I can restructure them this way.

> Best regards, Ilya Maximets.