mbox series

[RFC,v2,00/13] virtio/vsock: introduce SOCK_SEQPACKET support.

Message ID 20210115053553.1454517-1-arseny.krasnov@kaspersky.com
Headers show
Series virtio/vsock: introduce SOCK_SEQPACKET support. | expand

Message

Arseny Krasnov Jan. 15, 2021, 5:35 a.m. UTC
This patchset impelements support of SOCK_SEQPACKET for virtio
transport.
	As SOCK_SEQPACKET guarantees to save record boundaries, so to
do it, new packet operation was added: it marks start of record (with
record length in header), such packet doesn't carry any data.  To send
record, packet with start marker is sent first, then all data is sent
as usual 'RW' packets. On receiver's side, length of record is known
from packet with start record marker. Now as  packets of one socket
are not reordered neither on vsock nor on vhost transport layers, such
marker allows to restore original record on receiver's side. If user's
buffer is smaller that record length, when all out of size data is
dropped.
	Maximum length of datagram is not limited as in stream socket,
because same credit logic is used. Difference with stream socket is
that user is not woken up until whole record is received or error
occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
	Tests also implemented.

 Arseny Krasnov (13):
  af_vsock: implement 'vsock_wait_data()'.
  af_vsock: separate rx loops for STREAM/SEQPACKET.
  af_vsock: implement rx loops entry point
  af_vsock: replace previous stream rx loop.
  af_vsock: implement send logic for SOCK_SEQPACKET
  af_vsock: general support of SOCK_SEQPACKET type.
  af_vsock: update comments for stream sockets.
  virtio/vsock: dequeue callback for SOCK_SEQPACKET.
  virtio/vsock: implement fetch of record length
  virtio/vsock: update receive logic
  virtio/vsock: rest of SOCK_SEQPACKET support
  vhost/vsock: support for SOCK_SEQPACKET socket.
  vsock_test: add SOCK_SEQPACKET tests.

 drivers/vhost/vsock.c                   |   7 +-
 include/linux/virtio_vsock.h            |  12 +
 include/net/af_vsock.h                  |   6 +
 include/uapi/linux/virtio_vsock.h       |   9 +
 net/vmw_vsock/af_vsock.c                | 483 ++++++++++++++++------
 net/vmw_vsock/virtio_transport.c        |   4 +
 net/vmw_vsock/virtio_transport_common.c | 294 +++++++++++--
 tools/testing/vsock/util.c              |  32 +-
 tools/testing/vsock/util.h              |   3 +
 tools/testing/vsock/vsock_test.c        | 126 ++++++
 10 files changed, 824 insertions(+), 152 deletions(-)

 v1 -> v2:
 - patches reordered: af_vsock.c changes now before virtio vsock
 - patches reorganized: more small patches, where +/- are not mixed
 - tests for SOCK_SEQPACKET added
 - all commit messages updated
 - af_vsock.c: 'vsock_pre_recv_check()' inlined to
   'vsock_connectible_recvmsg()'
 - af_vsock.c: 'vsock_assign_transport()' returns ENODEV if transport
   was not found
 - virtio_transport_common.c: transport callback for seqpacket dequeue
 - virtio_transport_common.c: simplified
   'virtio_transport_recv_connected()'
 - virtio_transport_common.c: send reset on socket and packet type
			      mismatch.

Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>

Comments

Stefano Garzarella Jan. 18, 2021, 2:51 p.m. UTC | #1
On Fri, Jan 15, 2021 at 08:40:25AM +0300, Arseny Krasnov wrote:
>This adds 'vsock_wait_data()' function which is called from user's read

>syscall and waits until new socket data is arrived. It was based on code

>from stream dequeue logic and moved to separate function because it will

>be called both from SOCK_STREAM and SOCK_SEQPACKET receive loops.

>

>Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>

>---

> net/vmw_vsock/af_vsock.c | 47 ++++++++++++++++++++++++++++++++++++++++

> 1 file changed, 47 insertions(+)

>

>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c

>index b12d3a322242..af716f5a93a4 100644

>--- a/net/vmw_vsock/af_vsock.c

>+++ b/net/vmw_vsock/af_vsock.c

>@@ -1822,6 +1822,53 @@ static int vsock_stream_sendmsg(struct socket *sock, struct msghdr *msg,

> 	return err;

> }

>

>+static int vsock_wait_data(struct sock *sk, struct wait_queue_entry *wait,

>+			   long timeout,

>+			   struct vsock_transport_recv_notify_data *recv_data,

>+			   size_t target)

>+{

>+	int err = 0;

>+	struct vsock_sock *vsk;

>+	const struct vsock_transport *transport;


Please be sure that here and in all of the next patches, you follow the 
"Reverse Christmas tree" rule followed in net/ for the local variable 
declarations (order variable declaration lines longest to shortest).

>+

>+	vsk = vsock_sk(sk);

>+	transport = vsk->transport;

>+

>+	if (sk->sk_err != 0 ||

>+	    (sk->sk_shutdown & RCV_SHUTDOWN) ||

>+	    (vsk->peer_shutdown & SEND_SHUTDOWN)) {

>+		finish_wait(sk_sleep(sk), wait);

>+		return -1;

>+	}

>+	/* Don't wait for non-blocking sockets. */

>+	if (timeout == 0) {

>+		err = -EAGAIN;

>+		finish_wait(sk_sleep(sk), wait);

>+		return err;

>+	}

>+

>+	if (recv_data) {

>+		err = transport->notify_recv_pre_block(vsk, target, recv_data);

>+		if (err < 0) {

>+			finish_wait(sk_sleep(sk), wait);

>+			return err;

>+		}

>+	}

>+

>+	release_sock(sk);

>+	timeout = schedule_timeout(timeout);

>+	lock_sock(sk);

>+

>+	if (signal_pending(current)) {

>+		err = sock_intr_errno(timeout);

>+		finish_wait(sk_sleep(sk), wait);

>+	} else if (timeout == 0) {

>+		err = -EAGAIN;

>+		finish_wait(sk_sleep(sk), wait);

>+	}

>+


Since we are calling finish_wait() before return in all path, why not 
doing somethig like this:

out:
	finish_wait(sk_sleep(sk), wait);
>+	return err;

>+}


Then in the error paths you can do:

	err = XXX;
	goto out;

Thanks,
Stefano

>

> static int

> vsock_stream_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,

>-- 

>2.25.1

>
Stefano Garzarella Jan. 18, 2021, 3:15 p.m. UTC | #2
On Fri, Jan 15, 2021 at 08:44:22AM +0300, Arseny Krasnov wrote:
>This adds rest of logic for SEQPACKET:

>1) Shared functions for packet sending now set valid type of packet

>   according socket type.

>2) SEQPACKET specific function like SEQ_BEGIN send and data dequeue.

>3) Ops for virtio transport.

>4) TAP support for SEQPACKET is not so easy if it is necessary to send

>   whole record to TAP interface. This could be done by allocating

>   new packet when whole record is received, data of record must be

>   copied to TAP packet.

>

>Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>

>---

> include/linux/virtio_vsock.h            |  7 ++++

> net/vmw_vsock/virtio_transport.c        |  4 ++

> net/vmw_vsock/virtio_transport_common.c | 54 ++++++++++++++++++++++---

> 3 files changed, 59 insertions(+), 6 deletions(-)

>

>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h

>index af8705ea8b95..ad9783df97c9 100644

>--- a/include/linux/virtio_vsock.h

>+++ b/include/linux/virtio_vsock.h

>@@ -84,7 +84,14 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,

> 			       struct msghdr *msg,

> 			       size_t len, int flags);

>

>+bool virtio_transport_seqpacket_seq_send_len(struct vsock_sock *vsk, size_t len);

> size_t virtio_transport_seqpacket_seq_get_len(struct vsock_sock *vsk);

>+ssize_t

>+virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,

>+				   struct msghdr *msg,

>+				   size_t len,

>+				   int type);

>+

> s64 virtio_transport_stream_has_data(struct vsock_sock *vsk);

> s64 virtio_transport_stream_has_space(struct vsock_sock *vsk);

>

>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c

>index 2700a63ab095..5a7ab1befee8 100644

>--- a/net/vmw_vsock/virtio_transport.c

>+++ b/net/vmw_vsock/virtio_transport.c

>@@ -469,6 +469,10 @@ static struct virtio_transport virtio_transport = {

> 		.stream_is_active         = virtio_transport_stream_is_active,

> 		.stream_allow             = virtio_transport_stream_allow,

>

>+		.seqpacket_seq_send_len	  = virtio_transport_seqpacket_seq_send_len,

>+		.seqpacket_seq_get_len	  = virtio_transport_seqpacket_seq_get_len,

>+		.seqpacket_dequeue        = virtio_transport_seqpacket_dequeue,

>+

> 		.notify_poll_in           = virtio_transport_notify_poll_in,

> 		.notify_poll_out          = virtio_transport_notify_poll_out,

> 		.notify_recv_init         = virtio_transport_notify_recv_init,

>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c

>index c3e07eb1c666..5fdf1adfdaab 100644

>--- a/net/vmw_vsock/virtio_transport_common.c

>+++ b/net/vmw_vsock/virtio_transport_common.c

>@@ -139,6 +139,7 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)

> 		break;

> 	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:

> 	case VIRTIO_VSOCK_OP_CREDIT_REQUEST:

>+	case VIRTIO_VSOCK_OP_SEQ_BEGIN:

> 		hdr->op = cpu_to_le16(AF_VSOCK_OP_CONTROL);

> 		break;

> 	default:

>@@ -157,6 +158,10 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)

>

> void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt)

> {

>+	/* TODO: implement tap support for SOCK_SEQPACKET. */

>+	if (le32_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_SEQPACKET)

             ^
hdr.type is __le16, so please use le16_to_cpu()

>+		return;

>+

> 	if (pkt->tap_delivered)

> 		return;

>

>@@ -405,6 +410,19 @@ static u16 virtio_transport_get_type(struct sock *sk)

> 		return VIRTIO_VSOCK_TYPE_SEQPACKET;

> }

>

>+bool virtio_transport_seqpacket_seq_send_len(struct vsock_sock *vsk, size_t len)

>+{

>+	struct virtio_vsock_pkt_info info = {

>+		.type = VIRTIO_VSOCK_TYPE_SEQPACKET,

>+		.op = VIRTIO_VSOCK_OP_SEQ_BEGIN,

>+		.vsk = vsk,

>+		.flags = len

>+	};

>+

>+	return virtio_transport_send_pkt_info(vsk, &info);

>+}

>+EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_seq_send_len);

>+

> static inline void virtio_transport_del_n_free_pkt(struct virtio_vsock_pkt *pkt)

> {

> 	list_del(&pkt->list);

>@@ -576,6 +594,18 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,

> }

> EXPORT_SYMBOL_GPL(virtio_transport_stream_dequeue);

>

>+ssize_t

>+virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,

>+				   struct msghdr *msg,

>+				   size_t len, int flags)

>+{

>+	if (flags & MSG_PEEK)

>+		return -EOPNOTSUPP;

>+

>+	return virtio_transport_seqpacket_do_dequeue(vsk, msg, len);

>+}

>+EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);

>+

> int

> virtio_transport_dgram_dequeue(struct vsock_sock *vsk,

> 			       struct msghdr *msg,

>@@ -659,13 +689,15 @@ EXPORT_SYMBOL_GPL(virtio_transport_do_socket_init);

> void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val)

> {

> 	struct virtio_vsock_sock *vvs = vsk->trans;

>+	int type;

>

> 	if (*val > VIRTIO_VSOCK_MAX_BUF_SIZE)

> 		*val = VIRTIO_VSOCK_MAX_BUF_SIZE;

>

> 	vvs->buf_alloc = *val;

>

>-	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,

>+	type = virtio_transport_get_type(sk_vsock(vsk));

>+	virtio_transport_send_credit_update(vsk, type,

> 					    NULL);


With this change, you can move 'NULL' in the previous line.

> }

> EXPORT_SYMBOL_GPL(virtio_transport_notify_buffer_size);

>@@ -793,10 +825,11 @@ int virtio_transport_connect(struct vsock_sock *vsk)

> {

> 	struct virtio_vsock_pkt_info info = {

> 		.op = VIRTIO_VSOCK_OP_REQUEST,

>-		.type = VIRTIO_VSOCK_TYPE_STREAM,

> 		.vsk = vsk,

> 	};

>

>+	info.type = virtio_transport_get_type(sk_vsock(vsk));

>+

> 	return virtio_transport_send_pkt_info(vsk, &info);

> }

> EXPORT_SYMBOL_GPL(virtio_transport_connect);

>@@ -805,7 +838,6 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)

> {

> 	struct virtio_vsock_pkt_info info = {

> 		.op = VIRTIO_VSOCK_OP_SHUTDOWN,

>-		.type = VIRTIO_VSOCK_TYPE_STREAM,

> 		.flags = (mode & RCV_SHUTDOWN ?

> 			  VIRTIO_VSOCK_SHUTDOWN_RCV : 0) |

> 			 (mode & SEND_SHUTDOWN ?

>@@ -813,6 +845,8 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)

> 		.vsk = vsk,

> 	};

>

>+	info.type = virtio_transport_get_type(sk_vsock(vsk));

>+

> 	return virtio_transport_send_pkt_info(vsk, &info);

> }

> EXPORT_SYMBOL_GPL(virtio_transport_shutdown);

>@@ -834,12 +868,18 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk,

> {

> 	struct virtio_vsock_pkt_info info = {

> 		.op = VIRTIO_VSOCK_OP_RW,

>-		.type = VIRTIO_VSOCK_TYPE_STREAM,

> 		.msg = msg,

> 		.pkt_len = len,

> 		.vsk = vsk,

>+		.flags = 0,

> 	};

>

>+	info.type = virtio_transport_get_type(sk_vsock(vsk));

>+

>+	if (info.type == VIRTIO_VSOCK_TYPE_SEQPACKET &&

>+	    msg->msg_flags & MSG_EOR)

>+		info.flags |= VIRTIO_VSOCK_RW_EOR;

>+

> 	return virtio_transport_send_pkt_info(vsk, &info);

> }

> EXPORT_SYMBOL_GPL(virtio_transport_stream_enqueue);

>@@ -857,7 +897,6 @@ static int virtio_transport_reset(struct vsock_sock *vsk,

> {

> 	struct virtio_vsock_pkt_info info = {

> 		.op = VIRTIO_VSOCK_OP_RST,

>-		.type = VIRTIO_VSOCK_TYPE_STREAM,

> 		.reply = !!pkt,

> 		.vsk = vsk,

> 	};

>@@ -866,6 +905,8 @@ static int virtio_transport_reset(struct vsock_sock *vsk,

> 	if (pkt && le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST)

> 		return 0;

>

>+	info.type = virtio_transport_get_type(sk_vsock(vsk));

>+

> 	return virtio_transport_send_pkt_info(vsk, &info);

> }

>

>@@ -1177,13 +1218,14 @@ virtio_transport_send_response(struct vsock_sock *vsk,

> {

> 	struct virtio_vsock_pkt_info info = {

> 		.op = VIRTIO_VSOCK_OP_RESPONSE,

>-		.type = VIRTIO_VSOCK_TYPE_STREAM,

> 		.remote_cid = le64_to_cpu(pkt->hdr.src_cid),

> 		.remote_port = le32_to_cpu(pkt->hdr.src_port),

> 		.reply = true,

> 		.vsk = vsk,

> 	};

>

>+	info.type = virtio_transport_get_type(sk_vsock(vsk));

>+

> 	return virtio_transport_send_pkt_info(vsk, &info);

> }

>

>-- 

>2.25.1

>
Stefano Garzarella Jan. 18, 2021, 3:16 p.m. UTC | #3
On Fri, Jan 15, 2021 at 12:59:30PM +0300, stsp wrote:
>15.01.2021 08:35, Arseny Krasnov пишет:

>>      This patchset impelements support of SOCK_SEQPACKET for virtio

>>transport.

>>      As SOCK_SEQPACKET guarantees to save record boundaries, so to

>>do it, new packet operation was added: it marks start of record (with

>>record length in header), such packet doesn't carry any data.  To send

>>record, packet with start marker is sent first, then all data is sent

>>as usual 'RW' packets. On receiver's side, length of record is known

>>from packet with start record marker. Now as  packets of one socket

>>are not reordered neither on vsock nor on vhost transport layers, such

>>marker allows to restore original record on receiver's side. If user's

>>buffer is smaller that

>

>than

>

>

>>  record length, when

>

>then

>

>

>>  v1 -> v2:

>>  - patches reordered: af_vsock.c changes now before virtio vsock

>>  - patches reorganized: more small patches, where +/- are not mixed

>

>If you did this because I asked, then this

>is not what I asked. :)

>You can't just add some static func in a

>separate patch, as it will just produce the

>compilation warning of an unused function.

>I only asked to separate the refactoring from

>the new code. I.e. if you move some code

>block to a separate function, you shouldn't

>split that into 2 patches, one that adds a

>code block and another one that removes it.

>It should be in one patch, so that it is clear

>what was moved, and no new warnings are

>introduced.

>What I asked to separate, is the old code

>moves with the new code additions. Such

>things can definitely go in a separate patches.


Arseny, thanks for the v2.
I appreciated that you moved the af_vsock changes before the transport
and also the test, but I agree with stsp about split patches.

As stsp suggested, you can have some "preparation" patches that touch
the already existing code (e.g. rename vsock_stream_sendmsg in
vsock_connectible_sendmsg() and call it inside the new
vsock_stream_sendmsg, etc.), then a patch that adds seqpacket stuff in
af_vsock.

Also for virtio/vhost transports, you can have some patches that add
support in virtio_transport_common, then a patch that enable it in
virtio_transport and a patch for vhost_vsock, as you rightly did in
patch 12.

So, I'd suggest moving out the code that touches virtio_transport.c
from patch 11.

These changes should simplify the review.

In addition, you can also remove the . from the commit titles.


I left other comments in the single patches.

Thanks,
Stefano