mbox series

[RFC,net-next,v2,0/6] net: support QUIC crypto

Message ID 20220806001153.1461577-1-adel.abushaev@gmail.com
Headers show
Series net: support QUIC crypto | expand

Message

Adel Abouchaev Aug. 6, 2022, 12:11 a.m. UTC
QUIC requires end to end encryption of the data. The application usually
prepares the data in clear text, encrypts and calls send() which implies
multiple copies of the data before the packets hit the networking stack.
Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
pressure by reducing the number of copies.

The scope of kernel support is limited to the symmetric cryptography,
leaving the handshake to the user space library. For QUIC in particular,
the application packets that require symmetric cryptography are the 1RTT
packets with short headers. Kernel will encrypt the application packets
on transmission and decrypt on receive. This series implements Tx only,
because in QUIC server applications Tx outweighs Rx by orders of
magnitude.

Supporting the combination of QUIC and GSO requires the application to
correctly place the data and the kernel to correctly slice it. The
encryption process appends an arbitrary number of bytes (tag) to the end
of the message to authenticate it. The GSO value should include this
overhead, the offload would then subtract the tag size to parse the
input on Tx before chunking and encrypting it.

With the kernel cryptography, the buffer copy operation is conjoined
with the encryption operation. The memory bandwidth is reduced by 5-8%.
When devices supporting QUIC encryption in hardware come to the market,
we will be able to free further 7% of CPU utilization which is used
today for crypto operations.


Adel Abouchaev (6):
  Documentation on QUIC kernel Tx crypto.
  Define QUIC specific constants, control and data plane structures
  Add UDP ULP operations, initialization and handling prototype
    functions.
  Implement QUIC offload functions
  Add flow counters and Tx processing error counter
  Add self tests for ULP operations, flow setup and crypto tests
  v2: Moved the inner QUIC Kconfig from the ULP patch to QUIC patch.
  v2: Updated the tests to match the uAPI context structure fields.
  v2: Formatted the quic.rst document.

 Documentation/networking/index.rst     |    1 +
 Documentation/networking/quic.rst      |  186 +++
 include/net/inet_sock.h                |    2 +
 include/net/netns/mib.h                |    3 +
 include/net/quic.h                     |   59 +
 include/net/snmp.h                     |    6 +
 include/net/udp.h                      |   33 +
 include/uapi/linux/quic.h              |   61 +
 include/uapi/linux/snmp.h              |   11 +
 include/uapi/linux/udp.h               |    4 +
 net/Kconfig                            |    1 +
 net/Makefile                           |    1 +
 net/ipv4/Makefile                      |    3 +-
 net/ipv4/udp.c                         |   14 +
 net/ipv4/udp_ulp.c                     |  190 ++++
 net/quic/Kconfig                       |   16 +
 net/quic/Makefile                      |    8 +
 net/quic/quic_main.c                   | 1446 ++++++++++++++++++++++++
 net/quic/quic_proc.c                   |   45 +
 tools/testing/selftests/net/.gitignore |    3 +-
 tools/testing/selftests/net/Makefile   |    2 +-
 tools/testing/selftests/net/quic.c     | 1024 +++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   45 +
 23 files changed, 3161 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/networking/quic.rst
 create mode 100644 include/net/quic.h
 create mode 100644 include/uapi/linux/quic.h
 create mode 100644 net/ipv4/udp_ulp.c
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c
 create mode 100644 net/quic/quic_proc.c
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

Comments

Adel Abouchaev Aug. 8, 2022, 7:05 p.m. UTC | #1
Updated the commit message, will be visible in v3.

On 8/5/22 8:05 PM, Bagas Sanjaya wrote:
> On Fri, Aug 05, 2022 at 05:11:48PM -0700, Adel Abouchaev wrote:
>> Adding Documentation/networking/quic.rst file to describe kernel QUIC
>> code.
>>
> Better say "Add documentation for kernel QUIC code".
>
>> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
>> index 03b215bddde8..656fa1dac26b 100644
>> --- a/Documentation/networking/index.rst
>> +++ b/Documentation/networking/index.rst
>> @@ -90,6 +90,7 @@ Contents:
>>      plip
>>      ppp_generic
>>      proc_net_tcp
>> +   quic
>>      radiotap-headers
>>      rds
>>      regulatory
>> diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
>> new file mode 100644
>> index 000000000000..416099b80e60
>> --- /dev/null
>> +++ b/Documentation/networking/quic.rst
>> @@ -0,0 +1,186 @@
>> +.. _kernel_quic:
>> +
>> +===========
>> +KERNEL QUIC
>> +===========
>> +
>> +Overview
>> +========
>> +
>> +QUIC is a secure general-purpose transport protocol that creates a stateful
>> +interaction between a client and a server. QUIC provides end-to-end integrity
>> +and confidentiality. Refer to RFC 9000 for more information on QUIC.
>> +
>> +The kernel Tx side offload covers the encryption of the application streams
>> +in the kernel rather than in the application. These packets are 1RTT packets
>> +in QUIC connection. Encryption of every other packets is still done by the
>> +QUIC library in user space.
>> +
>> +
>> +
>> +User Interface
>> +==============
>> +
>> +Creating a QUIC connection
>> +--------------------------
>> +
>> +QUIC connection originates and terminates in the application, using one of many
>> +available QUIC libraries. The code instantiates QUIC client and QUIC server in
>> +some form and configures them to use certain addresses and ports for the
>> +source and destination. The client and server negotiate the set of keys to
>> +protect the communication during different phases of the connection, maintain
>> +the connection and perform congestion control.
>> +
>> +Requesting to add QUIC Tx kernel encryption to the connection
>> +-------------------------------------------------------------
>> +
>> +Each flow that should be encrypted by the kernel needs to be registered with
>> +the kernel using socket API. A setsockopt() call on the socket creates an
>> +association between the QUIC connection ID of the flow with the encryption
>> +parameters for the crypto operations:
>> +
>> +.. code-block:: c
>> +
>> +	struct quic_connection_info conn_info;
>> +	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
>> +	const size_t conn_id_len = sizeof(conn_id);
>> +	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
>> +			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
>> +	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
>> +			    0x08, 0x09, 0x0a, 0x0b};
>> +	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
>> +				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
>> +				};
>> +
>> +	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
>> +
>> +	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
>> +	conn_info.key.conn_id_length = 5;
>> +	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
>> +				      - conn_id_len],
>> +	       &conn_id, conn_id_len);
>> +
>> +	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
>> +	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
>> +	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
>> +
>> +	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
>> +		   sizeof(conn_info));
>> +
>> +
>> +Requesting to remove QUIC Tx kernel crypto offload control messages
>> +-------------------------------------------------------------------
>> +
>> +All flows are removed when the socket is closed. To request an explicit remove
>> +of the offload for the connection during the lifetime of the socket the process
>> +is similar to adding the flow. Only the connection ID and its length are
>> +necessary to supply to remove the connection from the offload:
>> +
>> +.. code-block:: c
>> +
>> +	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
>> +	conn_info.key.conn_id_length = 5;
>> +	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
>> +				      - conn_id_len],
>> +	       &conn_id, conn_id_len);
>> +	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
>> +		   sizeof(conn_info));
>> +
>> +Sending QUIC application data
>> +-----------------------------
>> +
>> +For QUIC Tx encryption offload, the application should use sendmsg() socket
>> +call and provide ancillary data with information on connection ID length and
>> +offload flags for the kernel to perform the encryption and GSO support if
>> +requested.
>> +
>> +.. code-block:: c
>> +
>> +	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
>> +	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
>> +	struct quic_tx_ancillary_data * anc_data;
>> +	size_t quic_data_len = 4500;
>> +	struct cmsghdr * cmsg_hdr;
>> +	char quic_data[9000];
>> +	struct iovec iov[2];
>> +	int send_len = 9000;
>> +	struct msghdr msg;
>> +	int err;
>> +
>> +	iov[0].iov_base = quic_data;
>> +	iov[0].iov_len = quic_data_len;
>> +	iov[1].iov_base = quic_data + 4500;
>> +	iov[1].iov_len = quic_data_len;
>> +
>> +	if (client.addr.sin_family == AF_INET) {
>> +		msg.msg_name = &client.addr;
>> +		msg.msg_namelen = sizeof(client.addr);
>> +	} else {
>> +		msg.msg_name = &client.addr6;
>> +		msg.msg_namelen = sizeof(client.addr6);
>> +	}
>> +
>> +	msg.msg_iov = iov;
>> +	msg.msg_iovlen = 2;
>> +	msg.msg_control = cmsg_buf;
>> +	msg.msg_controllen = sizeof(cmsg_buf);
>> +	cmsg_hdr = CMSG_FIRSTHDR(&msg);
>> +	cmsg_hdr->cmsg_level = IPPROTO_UDP;
>> +	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
>> +	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
>> +	anc_data = CMSG_DATA(cmsg_hdr);
>> +	anc_data->flags = 0;
>> +	anc_data->next_pkt_num = 0x0d65c9;
>> +	anc_data->conn_id_length = conn_id_len;
>> +	err = sendmsg(self->sfd, &msg, 0);
>> +
>> +QUIC Tx offload in kernel will read the data from userspace, encrypt and
>> +copy it to the ciphertext within the same operation.
>> +
>> +
>> +Sending QUIC application data with GSO
>> +--------------------------------------
>> +When GSO is in use, the kernel will use the GSO fragment size as the target
>> +for ciphertext. The packets from the user space should align on the boundary
>> +of GSO fragment size minus the size of the tag for the chosen cipher. For the
>> +GSO fragment 1200, the plain packets should follow each other at every 1184
>> +bytes, given the tag size of 16. After the encryption, the rest of the UDP
>> +and IP stacks will follow the defined value of GSO fragment which will include
>> +the trailing tag bytes.
>> +
>> +To set up GSO fragmentation:
>> +
>> +.. code-block:: c
>> +
>> +	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
>> +		   sizeof(frag_size));
>> +
>> +If the GSO fragment size is provided in ancillary data within the sendmsg()
>> +call, the value in ancillary data will take precedence over the segment size
>> +provided in setsockopt to split the payload into packets. This is consistent
>> +with the UDP stack behavior.
>> +
>> +Integrating to userspace QUIC libraries
>> +---------------------------------------
>> +
>> +Userspace QUIC libraries integration would depend on the implementation of the
>> +QUIC protocol. For MVFST library, the control plane is integrated into the
>> +handshake callbacks to properly configure the flows into the socket; and the
>> +data plane is integrated into the methods that perform encryption and send
>> +the packets to the batch scheduler for transmissions to the socket.
>> +
>> +MVFST library can be found at https://github.com/facebookincubator/mvfst.
>> +
>> +Statistics
>> +==========
>> +
>> +QUIC Tx offload to the kernel has counters
>> +(``/proc/net/quic_stat``):
>> +
>> +- ``QuicCurrTxSw`` -
>> +  number of currently active kernel offloaded QUIC connections
>> +- ``QuicTxSw`` -
>> +  accumulative total number of offloaded QUIC connections
>> +- ``QuicTxSwError`` -
>> +  accumulative total number of errors during QUIC Tx offload to kernel
>> +
> The documentation looks OK (no new warnings).
>
> Thanks.
>