mbox series

[RFC,v1,0/6] virtio/vsock: introduce SOCK_DGRAM support

Message ID 20210609232501.171257-1-jiang.wang@bytedance.com
Headers show
Series virtio/vsock: introduce SOCK_DGRAM support | expand

Message

Jiang Wang June 9, 2021, 11:24 p.m. UTC
This patchset implements support of SOCK_DGRAM for virtio
transport.

Datagram sockets are connectionless and unreliable. To avoid unfair contention
with stream and other sockets, add two more virtqueues and
a new feature bit to indicate if those two new queues exist or not.

Dgram does not use the existing credit update mechanism for
stream sockets. When sending from the guest/driver, sending packets 
synchronously, so the sender will get an error when the virtqueue is full.
When sending from the host/device, send packets asynchronously
because the descriptor memory belongs to the corresponding QEMU
process.

The virtio spec patch is here: 
https://www.spinics.net/lists/linux-virtualization/msg50027.html

For those who prefer git repo, here is the link for the linux kernel:
https://github.com/Jiang1155/linux/tree/vsock-dgram-v1

qemu patch link:
https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1


To do:
1. use skb when receiving packets
2. support multiple transport
3. support mergeable rx buffer


Jiang Wang (6):
  virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
  virtio/vsock: add support for virtio datagram
  vhost/vsock: add support for vhost dgram.
  vsock_test: add tests for vsock dgram
  vhost/vsock: add kconfig for vhost dgram support
  virtio/vsock: add sysfs for rx buf len for dgram

 drivers/vhost/Kconfig                              |   8 +
 drivers/vhost/vsock.c                              | 207 ++++++++--
 include/linux/virtio_vsock.h                       |   9 +
 include/net/af_vsock.h                             |   1 +
 .../trace/events/vsock_virtio_transport_common.h   |   5 +-
 include/uapi/linux/virtio_vsock.h                  |   4 +
 net/vmw_vsock/af_vsock.c                           |  12 +
 net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---
 net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-
 tools/testing/vsock/util.c                         | 105 +++++
 tools/testing/vsock/util.h                         |   4 +
 tools/testing/vsock/vsock_test.c                   | 195 ++++++++++
 12 files changed, 1070 insertions(+), 97 deletions(-)

Comments

Jason Wang June 10, 2021, 1:50 a.m. UTC | #1
在 2021/6/10 上午7:24, Jiang Wang 写道:
> This patchset implements support of SOCK_DGRAM for virtio

> transport.

>

> Datagram sockets are connectionless and unreliable. To avoid unfair contention

> with stream and other sockets, add two more virtqueues and

> a new feature bit to indicate if those two new queues exist or not.

>

> Dgram does not use the existing credit update mechanism for

> stream sockets. When sending from the guest/driver, sending packets

> synchronously, so the sender will get an error when the virtqueue is full.

> When sending from the host/device, send packets asynchronously

> because the descriptor memory belongs to the corresponding QEMU

> process.



What's the use case for the datagram vsock?


>

> The virtio spec patch is here:

> https://www.spinics.net/lists/linux-virtualization/msg50027.html



Have a quick glance, I suggest to split mergeable rx buffer into an 
separate patch.

But I think it's time to revisit the idea of unifying the virtio-net and 
virtio-vsock. Otherwise we're duplicating features and bugs.

Thanks


>

> For those who prefer git repo, here is the link for the linux kernel:

> https://github.com/Jiang1155/linux/tree/vsock-dgram-v1

>

> qemu patch link:

> https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1

>

>

> To do:

> 1. use skb when receiving packets

> 2. support multiple transport

> 3. support mergeable rx buffer

>

>

> Jiang Wang (6):

>    virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit

>    virtio/vsock: add support for virtio datagram

>    vhost/vsock: add support for vhost dgram.

>    vsock_test: add tests for vsock dgram

>    vhost/vsock: add kconfig for vhost dgram support

>    virtio/vsock: add sysfs for rx buf len for dgram

>

>   drivers/vhost/Kconfig                              |   8 +

>   drivers/vhost/vsock.c                              | 207 ++++++++--

>   include/linux/virtio_vsock.h                       |   9 +

>   include/net/af_vsock.h                             |   1 +

>   .../trace/events/vsock_virtio_transport_common.h   |   5 +-

>   include/uapi/linux/virtio_vsock.h                  |   4 +

>   net/vmw_vsock/af_vsock.c                           |  12 +

>   net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---

>   net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-

>   tools/testing/vsock/util.c                         | 105 +++++

>   tools/testing/vsock/util.h                         |   4 +

>   tools/testing/vsock/vsock_test.c                   | 195 ++++++++++

>   12 files changed, 1070 insertions(+), 97 deletions(-)

>
Jiang Wang June 10, 2021, 3:43 a.m. UTC | #2
On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>

>

> 在 2021/6/10 上午7:24, Jiang Wang 写道:

> > This patchset implements support of SOCK_DGRAM for virtio

> > transport.

> >

> > Datagram sockets are connectionless and unreliable. To avoid unfair contention

> > with stream and other sockets, add two more virtqueues and

> > a new feature bit to indicate if those two new queues exist or not.

> >

> > Dgram does not use the existing credit update mechanism for

> > stream sockets. When sending from the guest/driver, sending packets

> > synchronously, so the sender will get an error when the virtqueue is full.

> > When sending from the host/device, send packets asynchronously

> > because the descriptor memory belongs to the corresponding QEMU

> > process.

>

>

> What's the use case for the datagram vsock?

>

One use case is for non critical info logging from the guest
to the host, such as the performance data of some applications.

It can also be used to replace UDP communications between
the guest and the host.

> >

> > The virtio spec patch is here:

> > https://www.spinics.net/lists/linux-virtualization/msg50027.html

>

>

> Have a quick glance, I suggest to split mergeable rx buffer into an

> separate patch.


Sure.

> But I think it's time to revisit the idea of unifying the virtio-net and

> virtio-vsock. Otherwise we're duplicating features and bugs.


For mergeable rxbuf related code, I think a set of common helper
functions can be used by both virtio-net and virtio-vsock. For other
parts, that may not be very beneficial. I will think about more.

If there is a previous email discussion about this topic, could you send me
some links? I did a quick web search but did not find any related
info. Thanks.

> Thanks

>

>

> >

> > For those who prefer git repo, here is the link for the linux kernel:

> > https://github.com/Jiang1155/linux/tree/vsock-dgram-v1

> >

> > qemu patch link:

> > https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1

> >

> >

> > To do:

> > 1. use skb when receiving packets

> > 2. support multiple transport

> > 3. support mergeable rx buffer

> >

> >

> > Jiang Wang (6):

> >    virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit

> >    virtio/vsock: add support for virtio datagram

> >    vhost/vsock: add support for vhost dgram.

> >    vsock_test: add tests for vsock dgram

> >    vhost/vsock: add kconfig for vhost dgram support

> >    virtio/vsock: add sysfs for rx buf len for dgram

> >

> >   drivers/vhost/Kconfig                              |   8 +

> >   drivers/vhost/vsock.c                              | 207 ++++++++--

> >   include/linux/virtio_vsock.h                       |   9 +

> >   include/net/af_vsock.h                             |   1 +

> >   .../trace/events/vsock_virtio_transport_common.h   |   5 +-

> >   include/uapi/linux/virtio_vsock.h                  |   4 +

> >   net/vmw_vsock/af_vsock.c                           |  12 +

> >   net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---

> >   net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-

> >   tools/testing/vsock/util.c                         | 105 +++++

> >   tools/testing/vsock/util.h                         |   4 +

> >   tools/testing/vsock/vsock_test.c                   | 195 ++++++++++

> >   12 files changed, 1070 insertions(+), 97 deletions(-)

> >

>
Jason Wang June 10, 2021, 4:02 a.m. UTC | #3
在 2021/6/10 上午11:43, Jiang Wang . 写道:
> On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:

>>

>> 在 2021/6/10 上午7:24, Jiang Wang 写道:

>>> This patchset implements support of SOCK_DGRAM for virtio

>>> transport.

>>>

>>> Datagram sockets are connectionless and unreliable. To avoid unfair contention

>>> with stream and other sockets, add two more virtqueues and

>>> a new feature bit to indicate if those two new queues exist or not.

>>>

>>> Dgram does not use the existing credit update mechanism for

>>> stream sockets. When sending from the guest/driver, sending packets

>>> synchronously, so the sender will get an error when the virtqueue is full.

>>> When sending from the host/device, send packets asynchronously

>>> because the descriptor memory belongs to the corresponding QEMU

>>> process.

>>

>> What's the use case for the datagram vsock?

>>

> One use case is for non critical info logging from the guest

> to the host, such as the performance data of some applications.



Anything that prevents you from using the stream socket?


>

> It can also be used to replace UDP communications between

> the guest and the host.



Any advantage for VSOCK in this case? Is it for performance (I guess not 
since I don't exepct vsock will be faster).

An obvious drawback is that it breaks the migration. Using UDP you can 
have a very rich features support from the kernel where vsock can't.


>

>>> The virtio spec patch is here:

>>> https://www.spinics.net/lists/linux-virtualization/msg50027.html

>>

>> Have a quick glance, I suggest to split mergeable rx buffer into an

>> separate patch.

> Sure.

>

>> But I think it's time to revisit the idea of unifying the virtio-net and

>> virtio-vsock. Otherwise we're duplicating features and bugs.

> For mergeable rxbuf related code, I think a set of common helper

> functions can be used by both virtio-net and virtio-vsock. For other

> parts, that may not be very beneficial. I will think about more.

>

> If there is a previous email discussion about this topic, could you send me

> some links? I did a quick web search but did not find any related

> info. Thanks.



We had a lot:

[1] 
https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
[2] 
https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
[3] https://www.lkml.org/lkml/2020/1/16/2043

Thanks

>

>> Thanks

>>

>>

>>> For those who prefer git repo, here is the link for the linux kernel:

>>> https://github.com/Jiang1155/linux/tree/vsock-dgram-v1

>>>

>>> qemu patch link:

>>> https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1

>>>

>>>

>>> To do:

>>> 1. use skb when receiving packets

>>> 2. support multiple transport

>>> 3. support mergeable rx buffer

>>>

>>>

>>> Jiang Wang (6):

>>>     virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit

>>>     virtio/vsock: add support for virtio datagram

>>>     vhost/vsock: add support for vhost dgram.

>>>     vsock_test: add tests for vsock dgram

>>>     vhost/vsock: add kconfig for vhost dgram support

>>>     virtio/vsock: add sysfs for rx buf len for dgram

>>>

>>>    drivers/vhost/Kconfig                              |   8 +

>>>    drivers/vhost/vsock.c                              | 207 ++++++++--

>>>    include/linux/virtio_vsock.h                       |   9 +

>>>    include/net/af_vsock.h                             |   1 +

>>>    .../trace/events/vsock_virtio_transport_common.h   |   5 +-

>>>    include/uapi/linux/virtio_vsock.h                  |   4 +

>>>    net/vmw_vsock/af_vsock.c                           |  12 +

>>>    net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---

>>>    net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-

>>>    tools/testing/vsock/util.c                         | 105 +++++

>>>    tools/testing/vsock/util.h                         |   4 +

>>>    tools/testing/vsock/vsock_test.c                   | 195 ++++++++++

>>>    12 files changed, 1070 insertions(+), 97 deletions(-)

>>>
Stefano Garzarella June 10, 2021, 7:23 a.m. UTC | #4
On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:
>

>在 2021/6/10 上午11:43, Jiang Wang . 写道:

>>On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:

>>>

>>>在 2021/6/10 上午7:24, Jiang Wang 写道:

>>>>This patchset implements support of SOCK_DGRAM for virtio

>>>>transport.

>>>>

>>>>Datagram sockets are connectionless and unreliable. To avoid unfair contention

>>>>with stream and other sockets, add two more virtqueues and

>>>>a new feature bit to indicate if those two new queues exist or not.

>>>>

>>>>Dgram does not use the existing credit update mechanism for

>>>>stream sockets. When sending from the guest/driver, sending packets

>>>>synchronously, so the sender will get an error when the virtqueue is 

>>>>full.

>>>>When sending from the host/device, send packets asynchronously

>>>>because the descriptor memory belongs to the corresponding QEMU

>>>>process.

>>>

>>>What's the use case for the datagram vsock?

>>>

>>One use case is for non critical info logging from the guest

>>to the host, such as the performance data of some applications.

>

>

>Anything that prevents you from using the stream socket?

>

>

>>

>>It can also be used to replace UDP communications between

>>the guest and the host.

>

>

>Any advantage for VSOCK in this case? Is it for performance (I guess 

>not since I don't exepct vsock will be faster).


I think the general advantage to using vsock are for the guest agents 
that potentially don't need any configuration.

>

>An obvious drawback is that it breaks the migration. Using UDP you can 

>have a very rich features support from the kernel where vsock can't.

>


Thanks for bringing this up!
What features does UDP support and datagram on vsock could not support?

>

>>

>>>>The virtio spec patch is here:

>>>>https://www.spinics.net/lists/linux-virtualization/msg50027.html

>>>

>>>Have a quick glance, I suggest to split mergeable rx buffer into an

>>>separate patch.

>>Sure.

>>

>>>But I think it's time to revisit the idea of unifying the virtio-net 

>>>and

>>>virtio-vsock. Otherwise we're duplicating features and bugs.

>>For mergeable rxbuf related code, I think a set of common helper

>>functions can be used by both virtio-net and virtio-vsock. For other

>>parts, that may not be very beneficial. I will think about more.

>>

>>If there is a previous email discussion about this topic, could you 

>>send me

>>some links? I did a quick web search but did not find any related

>>info. Thanks.

>

>

>We had a lot:

>

>[1] 

>https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/

>[2] 

>https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html

>[3] https://www.lkml.org/lkml/2020/1/16/2043

>


When I tried it, the biggest problem that blocked me were all the 
features strictly related to TCP/IP stack and ethernet devices that 
vsock device doesn't know how to handle: TSO, GSO, checksums, MAC, napi, 
xdp, min ethernet frame size, MTU, etc.

So in my opinion to unify them is not so simple, because vsock is not 
really an ethernet device, but simply a socket.

But I fully agree that we shouldn't duplicate functionality and code, so 
maybe we could find those common parts and create helpers to be used by 
both.

Thanks,
Stefano
Jason Wang June 10, 2021, 7:46 a.m. UTC | #5
在 2021/6/10 下午3:23, Stefano Garzarella 写道:
> On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:

>>

>> 在 2021/6/10 上午11:43, Jiang Wang . 写道:

>>> On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:

>>>>

>>>> 在 2021/6/10 上午7:24, Jiang Wang 写道:

>>>>> This patchset implements support of SOCK_DGRAM for virtio

>>>>> transport.

>>>>>

>>>>> Datagram sockets are connectionless and unreliable. To avoid 

>>>>> unfair contention

>>>>> with stream and other sockets, add two more virtqueues and

>>>>> a new feature bit to indicate if those two new queues exist or not.

>>>>>

>>>>> Dgram does not use the existing credit update mechanism for

>>>>> stream sockets. When sending from the guest/driver, sending packets

>>>>> synchronously, so the sender will get an error when the virtqueue 

>>>>> is full.

>>>>> When sending from the host/device, send packets asynchronously

>>>>> because the descriptor memory belongs to the corresponding QEMU

>>>>> process.

>>>>

>>>> What's the use case for the datagram vsock?

>>>>

>>> One use case is for non critical info logging from the guest

>>> to the host, such as the performance data of some applications.

>>

>>

>> Anything that prevents you from using the stream socket?

>>

>>

>>>

>>> It can also be used to replace UDP communications between

>>> the guest and the host.

>>

>>

>> Any advantage for VSOCK in this case? Is it for performance (I guess 

>> not since I don't exepct vsock will be faster).

>

> I think the general advantage to using vsock are for the guest agents 

> that potentially don't need any configuration.



Right, I wonder if we really need datagram consider the host to guest 
communication is reliable.

(Note that I don't object it since vsock has already supported that, 
just wonder its use cases)


>

>>

>> An obvious drawback is that it breaks the migration. Using UDP you 

>> can have a very rich features support from the kernel where vsock can't.

>>

>

> Thanks for bringing this up!

> What features does UDP support and datagram on vsock could not support?



E.g the sendpage() and busy polling. And using UDP means qdiscs and eBPF 
can work.


>

>>

>>>

>>>>> The virtio spec patch is here:

>>>>> https://www.spinics.net/lists/linux-virtualization/msg50027.html

>>>>

>>>> Have a quick glance, I suggest to split mergeable rx buffer into an

>>>> separate patch.

>>> Sure.

>>>

>>>> But I think it's time to revisit the idea of unifying the 

>>>> virtio-net and

>>>> virtio-vsock. Otherwise we're duplicating features and bugs.

>>> For mergeable rxbuf related code, I think a set of common helper

>>> functions can be used by both virtio-net and virtio-vsock. For other

>>> parts, that may not be very beneficial. I will think about more.

>>>

>>> If there is a previous email discussion about this topic, could you 

>>> send me

>>> some links? I did a quick web search but did not find any related

>>> info. Thanks.

>>

>>

>> We had a lot:

>>

>> [1] 

>> https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/

>> [2] 

>> https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html

>> [3] https://www.lkml.org/lkml/2020/1/16/2043

>>

>

> When I tried it, the biggest problem that blocked me were all the 

> features strictly related to TCP/IP stack and ethernet devices that 

> vsock device doesn't know how to handle: TSO, GSO, checksums, MAC, 

> napi, xdp, min ethernet frame size, MTU, etc.



It depends on which level we want to share:

1) sharing codes
2) sharing devices
3) make vsock a protocol that is understood by the network core

We can start from 1), the low level tx/rx logic can be shared at both 
virtio-net and vhost-net. For 2) we probably need some work on the spec, 
probably with a new feature bit to demonstrate that it's a vsock device 
not a ethernet device. Then if it is probed as a vsock device we won't 
let packet to be delivered in the TCP/IP stack. For 3), it would be even 
harder and I'm not sure it's worth to do that.


>

> So in my opinion to unify them is not so simple, because vsock is not 

> really an ethernet device, but simply a socket.



We can start from sharing codes.


>

> But I fully agree that we shouldn't duplicate functionality and code, 

> so maybe we could find those common parts and create helpers to be 

> used by both.



Yes.

Thanks


>

> Thanks,

> Stefano

>
Stefano Garzarella June 10, 2021, 9:51 a.m. UTC | #6
On Thu, Jun 10, 2021 at 03:46:55PM +0800, Jason Wang wrote:
>

>在 2021/6/10 下午3:23, Stefano Garzarella 写道:

>>On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:

>>>

>>>在 2021/6/10 上午11:43, Jiang Wang . 写道:

>>>>On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:

>>>>>

>>>>>在 2021/6/10 上午7:24, Jiang Wang 写道:

>>>>>>This patchset implements support of SOCK_DGRAM for virtio

>>>>>>transport.

>>>>>>

>>>>>>Datagram sockets are connectionless and unreliable. To avoid 

>>>>>>unfair contention

>>>>>>with stream and other sockets, add two more virtqueues and

>>>>>>a new feature bit to indicate if those two new queues exist or not.

>>>>>>

>>>>>>Dgram does not use the existing credit update mechanism for

>>>>>>stream sockets. When sending from the guest/driver, sending packets

>>>>>>synchronously, so the sender will get an error when the 

>>>>>>virtqueue is full.

>>>>>>When sending from the host/device, send packets asynchronously

>>>>>>because the descriptor memory belongs to the corresponding QEMU

>>>>>>process.

>>>>>

>>>>>What's the use case for the datagram vsock?

>>>>>

>>>>One use case is for non critical info logging from the guest

>>>>to the host, such as the performance data of some applications.

>>>

>>>

>>>Anything that prevents you from using the stream socket?

>>>

>>>

>>>>

>>>>It can also be used to replace UDP communications between

>>>>the guest and the host.

>>>

>>>

>>>Any advantage for VSOCK in this case? Is it for performance (I 

>>>guess not since I don't exepct vsock will be faster).

>>

>>I think the general advantage to using vsock are for the guest 

>>agents that potentially don't need any configuration.

>

>

>Right, I wonder if we really need datagram consider the host to guest 

>communication is reliable.

>

>(Note that I don't object it since vsock has already supported that, 

>just wonder its use cases)


Yep, it was the same concern I had :-)
Also because we're now adding SEQPACKET, which provides reliable 
datagram support.

But IIUC the use case is the logging where you don't need a reliable 
communication and you want to avoid to keep more open connections with 
different guests.

So the server in the host can be pretty simple and doesn't have to 
handle connections. It just waits for datagrams on a port.

>

>

>>

>>>

>>>An obvious drawback is that it breaks the migration. Using UDP you 

>>>can have a very rich features support from the kernel where vsock 

>>>can't.

>>>

>>

>>Thanks for bringing this up!

>>What features does UDP support and datagram on vsock could not support?

>

>

>E.g the sendpage() and busy polling. And using UDP means qdiscs and 

>eBPF can work.


Thanks, I see!

>

>

>>

>>>

>>>>

>>>>>>The virtio spec patch is here:

>>>>>>https://www.spinics.net/lists/linux-virtualization/msg50027.html

>>>>>

>>>>>Have a quick glance, I suggest to split mergeable rx buffer into an

>>>>>separate patch.

>>>>Sure.

>>>>

>>>>>But I think it's time to revisit the idea of unifying the 

>>>>>virtio-net and

>>>>>virtio-vsock. Otherwise we're duplicating features and bugs.

>>>>For mergeable rxbuf related code, I think a set of common helper

>>>>functions can be used by both virtio-net and virtio-vsock. For other

>>>>parts, that may not be very beneficial. I will think about more.

>>>>

>>>>If there is a previous email discussion about this topic, could 

>>>>you send me

>>>>some links? I did a quick web search but did not find any related

>>>>info. Thanks.

>>>

>>>

>>>We had a lot:

>>>

>>>[1] https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/

>>>[2] https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html

>>>[3] https://www.lkml.org/lkml/2020/1/16/2043

>>>

>>

>>When I tried it, the biggest problem that blocked me were all the 

>>features strictly related to TCP/IP stack and ethernet devices that 

>>vsock device doesn't know how to handle: TSO, GSO, checksums, MAC, 

>>napi, xdp, min ethernet frame size, MTU, etc.

>

>

>It depends on which level we want to share:

>

>1) sharing codes

>2) sharing devices

>3) make vsock a protocol that is understood by the network core

>

>We can start from 1), the low level tx/rx logic can be shared at both 

>virtio-net and vhost-net. For 2) we probably need some work on the 

>spec, probably with a new feature bit to demonstrate that it's a vsock 

>device not a ethernet device. Then if it is probed as a vsock device we 

>won't let packet to be delivered in the TCP/IP stack. For 3), it would 

>be even harder and I'm not sure it's worth to do that.

>

>

>>

>>So in my opinion to unify them is not so simple, because vsock is not 

>>really an ethernet device, but simply a socket.

>

>

>We can start from sharing codes.


Yep, I agree, and maybe the mergeable buffer is a good starting point to 
share code!

@Jiang, do you want to take a look of this possibility?

Thanks,
Stefano
Jiang Wang June 10, 2021, 4:44 p.m. UTC | #7
On Thu, Jun 10, 2021 at 2:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>

> On Thu, Jun 10, 2021 at 03:46:55PM +0800, Jason Wang wrote:

> >

> >在 2021/6/10 下午3:23, Stefano Garzarella 写道:

> >>On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:

> >>>

> >>>在 2021/6/10 上午11:43, Jiang Wang . 写道:

> >>>>On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:

> >>>>>

> >>>>>在 2021/6/10 上午7:24, Jiang Wang 写道:

> >>>>>>This patchset implements support of SOCK_DGRAM for virtio

> >>>>>>transport.

> >>>>>>

> >>>>>>Datagram sockets are connectionless and unreliable. To avoid

> >>>>>>unfair contention

> >>>>>>with stream and other sockets, add two more virtqueues and

> >>>>>>a new feature bit to indicate if those two new queues exist or not.

> >>>>>>

> >>>>>>Dgram does not use the existing credit update mechanism for

> >>>>>>stream sockets. When sending from the guest/driver, sending packets

> >>>>>>synchronously, so the sender will get an error when the

> >>>>>>virtqueue is full.

> >>>>>>When sending from the host/device, send packets asynchronously

> >>>>>>because the descriptor memory belongs to the corresponding QEMU

> >>>>>>process.

> >>>>>

> >>>>>What's the use case for the datagram vsock?

> >>>>>

> >>>>One use case is for non critical info logging from the guest

> >>>>to the host, such as the performance data of some applications.

> >>>

> >>>

> >>>Anything that prevents you from using the stream socket?

> >>>

> >>>

> >>>>

> >>>>It can also be used to replace UDP communications between

> >>>>the guest and the host.

> >>>

> >>>

> >>>Any advantage for VSOCK in this case? Is it for performance (I

> >>>guess not since I don't exepct vsock will be faster).

> >>

> >>I think the general advantage to using vsock are for the guest

> >>agents that potentially don't need any configuration.

> >

> >

> >Right, I wonder if we really need datagram consider the host to guest

> >communication is reliable.

> >

> >(Note that I don't object it since vsock has already supported that,

> >just wonder its use cases)

>

> Yep, it was the same concern I had :-)

> Also because we're now adding SEQPACKET, which provides reliable

> datagram support.

>

> But IIUC the use case is the logging where you don't need a reliable

> communication and you want to avoid to keep more open connections with

> different guests.

>

> So the server in the host can be pretty simple and doesn't have to

> handle connections. It just waits for datagrams on a port.


Yes. With datagram sockets, the application code is simpler than the stream
sockets. Also, it will be easier to port existing applications written
for dgram,
such as UDP or unix domain socket with datagram types to the vsock
dgram sockets.

Compared to UDP, the vsock dgram has a minimum configuration. When
sending data from the guest to the host, the client in the guest knows
the host CID will always be 2. For UDP, the host IP may change depending
on the configuration.

The advantage over UNIX domain sockets is more obvious. We
have some applications talking to each other with UNIX domain sockets,
but now the applications are running inside VMs, so we will need to
use vsock and those applications use datagram types, so it is natural
and simpler if vsock has datagram types too.

And we can also run applications for vmware vsock dgram on
the QEMU directly.

btw, SEQPACKET also supports datagram, but the application code
logic is similar to stream sockets and the server needs to maintain
connections.

> >

> >

> >>

> >>>

> >>>An obvious drawback is that it breaks the migration. Using UDP you

> >>>can have a very rich features support from the kernel where vsock

> >>>can't.

> >>>

> >>

> >>Thanks for bringing this up!

> >>What features does UDP support and datagram on vsock could not support?

> >

> >

> >E.g the sendpage() and busy polling. And using UDP means qdiscs and

> >eBPF can work.

>

> Thanks, I see!

>

> >

> >

> >>

> >>>

> >>>>

> >>>>>>The virtio spec patch is here:

> >>>>>>https://www.spinics.net/lists/linux-virtualization/msg50027.html

> >>>>>

> >>>>>Have a quick glance, I suggest to split mergeable rx buffer into an

> >>>>>separate patch.

> >>>>Sure.

> >>>>

> >>>>>But I think it's time to revisit the idea of unifying the

> >>>>>virtio-net and

> >>>>>virtio-vsock. Otherwise we're duplicating features and bugs.

> >>>>For mergeable rxbuf related code, I think a set of common helper

> >>>>functions can be used by both virtio-net and virtio-vsock. For other

> >>>>parts, that may not be very beneficial. I will think about more.

> >>>>

> >>>>If there is a previous email discussion about this topic, could

> >>>>you send me

> >>>>some links? I did a quick web search but did not find any related

> >>>>info. Thanks.

> >>>

> >>>

> >>>We had a lot:

> >>>

> >>>[1] https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/

> >>>[2] https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html

> >>>[3] https://www.lkml.org/lkml/2020/1/16/2043

> >>>

Got it. I will check, thanks.

> >>When I tried it, the biggest problem that blocked me were all the

> >>features strictly related to TCP/IP stack and ethernet devices that

> >>vsock device doesn't know how to handle: TSO, GSO, checksums, MAC,

> >>napi, xdp, min ethernet frame size, MTU, etc.

> >

> >

> >It depends on which level we want to share:

> >

> >1) sharing codes

> >2) sharing devices

> >3) make vsock a protocol that is understood by the network core

> >

> >We can start from 1), the low level tx/rx logic can be shared at both

> >virtio-net and vhost-net. For 2) we probably need some work on the

> >spec, probably with a new feature bit to demonstrate that it's a vsock

> >device not a ethernet device. Then if it is probed as a vsock device we

> >won't let packet to be delivered in the TCP/IP stack. For 3), it would

> >be even harder and I'm not sure it's worth to do that.

> >

> >

> >>

> >>So in my opinion to unify them is not so simple, because vsock is not

> >>really an ethernet device, but simply a socket.

> >

> >

> >We can start from sharing codes.

>

> Yep, I agree, and maybe the mergeable buffer is a good starting point to

> share code!

>

> @Jiang, do you want to take a look of this possibility?


Yes. I already read code about mergeable buffer in virtio-net, which I think
is the only place so far to use it. I will check how to share the code.

Thanks for all the comments.

> Thanks,

> Stefano

>
Stefano Garzarella June 18, 2021, 9:35 a.m. UTC | #8
On Wed, Jun 09, 2021 at 11:24:52PM +0000, Jiang Wang wrote:
>This patchset implements support of SOCK_DGRAM for virtio

>transport.

>

>Datagram sockets are connectionless and unreliable. To avoid unfair contention

>with stream and other sockets, add two more virtqueues and

>a new feature bit to indicate if those two new queues exist or not.

>

>Dgram does not use the existing credit update mechanism for

>stream sockets. When sending from the guest/driver, sending packets

>synchronously, so the sender will get an error when the virtqueue is full.

>When sending from the host/device, send packets asynchronously

>because the descriptor memory belongs to the corresponding QEMU

>process.

>

>The virtio spec patch is here:

>https://www.spinics.net/lists/linux-virtualization/msg50027.html

>

>For those who prefer git repo, here is the link for the linux kernel:

>https://github.com/Jiang1155/linux/tree/vsock-dgram-v1

>

>qemu patch link:

>https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1

>

>

>To do:

>1. use skb when receiving packets

>2. support multiple transport

>3. support mergeable rx buffer


Jiang, I'll do a fast review, but I think is better to rebase on 
net-next since SEQPACKET support is now merged.

Please also run ./scripts/checkpatch.pl, there are a lot of issues.

I'll leave some simple comments in the patches, but I prefer to do a 
deep review after the rebase and the dynamic handling of DGRAM.

Thanks,
Stefano
Stefano Garzarella June 18, 2021, 9:39 a.m. UTC | #9
On Wed, Jun 09, 2021 at 11:24:53PM +0000, Jiang Wang wrote:
>When this feature is enabled, allocate 5 queues,

>otherwise, allocate 3 queues to be compatible with

>old QEMU versions.

>

>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>

>---

> drivers/vhost/vsock.c             |  3 +-

> include/linux/virtio_vsock.h      |  9 +++++

> include/uapi/linux/virtio_vsock.h |  3 ++

> net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----

> 4 files changed, 80 insertions(+), 8 deletions(-)

>

>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c

>index 5e78fb719602..81d064601093 100644

>--- a/drivers/vhost/vsock.c

>+++ b/drivers/vhost/vsock.c

>@@ -31,7 +31,8 @@

>

> enum {

> 	VHOST_VSOCK_FEATURES = VHOST_FEATURES |

>-			       (1ULL << VIRTIO_F_ACCESS_PLATFORM)

>+			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |

>+			       (1ULL << VIRTIO_VSOCK_F_DGRAM)

> };

>

> enum {

>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h

>index dc636b727179..ba3189ed9345 100644

>--- a/include/linux/virtio_vsock.h

>+++ b/include/linux/virtio_vsock.h

>@@ -18,6 +18,15 @@ enum {

> 	VSOCK_VQ_MAX    = 3,

> };

>

>+enum {

>+	VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */

>+	VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */

>+	VSOCK_VQ_DGRAM_RX       = 2,

>+	VSOCK_VQ_DGRAM_TX       = 3,

>+	VSOCK_VQ_EX_EVENT       = 4,

>+	VSOCK_VQ_EX_MAX         = 5,

>+};

>+

> /* Per-socket state (accessed via vsk->trans) */

> struct virtio_vsock_sock {

> 	struct vsock_sock *vsk;

>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h

>index 1d57ed3d84d2..b56614dff1c9 100644

>--- a/include/uapi/linux/virtio_vsock.h

>+++ b/include/uapi/linux/virtio_vsock.h

>@@ -38,6 +38,9 @@

> #include <linux/virtio_ids.h>

> #include <linux/virtio_config.h>

>

>+/* The feature bitmap for virtio net */

>+#define VIRTIO_VSOCK_F_DGRAM	0	/* Host support dgram vsock */

>+

> struct virtio_vsock_config {

> 	__le64 guest_cid;

> } __attribute__((packed));

>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c

>index 2700a63ab095..7dcb8db23305 100644

>--- a/net/vmw_vsock/virtio_transport.c

>+++ b/net/vmw_vsock/virtio_transport.c

>@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */

>

> struct virtio_vsock {

> 	struct virtio_device *vdev;

>-	struct virtqueue *vqs[VSOCK_VQ_MAX];

>+	struct virtqueue **vqs;

>+	bool has_dgram;

>

> 	/* Virtqueue processing is deferred to a workqueue */

> 	struct work_struct tx_work;

>@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,

> 	struct scatterlist sg;

> 	struct virtqueue *vq;

>

>-	vq = vsock->vqs[VSOCK_VQ_EVENT];

>+	if (vsock->has_dgram)

>+		vq = vsock->vqs[VSOCK_VQ_EX_EVENT];

>+	else

>+		vq = vsock->vqs[VSOCK_VQ_EVENT];

>

> 	sg_init_one(&sg, event, sizeof(*event));

>

>@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)

> 		virtio_vsock_event_fill_one(vsock, event);

> 	}

>

>-	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

>+	if (vsock->has_dgram)

>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);

>+	else

>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

> }

>

> static void virtio_vsock_reset_sock(struct sock *sk)

>@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)

> 		container_of(work, struct virtio_vsock, event_work);

> 	struct virtqueue *vq;

>

>-	vq = vsock->vqs[VSOCK_VQ_EVENT];

>+	if (vsock->has_dgram)

>+		vq = vsock->vqs[VSOCK_VQ_EX_EVENT];

>+	else

>+		vq = vsock->vqs[VSOCK_VQ_EVENT];

>

> 	mutex_lock(&vsock->event_lock);

>

>@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)

> 		}

> 	} while (!virtqueue_enable_cb(vq));

>

>-	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

>+	if (vsock->has_dgram)

>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);

>+	else

>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

> out:

> 	mutex_unlock(&vsock->event_lock);

> }

>@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)

> 	queue_work(virtio_vsock_workqueue, &vsock->tx_work);

> }

>

>+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)

>+{

>+}

>+

> static void virtio_vsock_rx_done(struct virtqueue *vq)

> {

> 	struct virtio_vsock *vsock = vq->vdev->priv;

>@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)

> 	queue_work(virtio_vsock_workqueue, &vsock->rx_work);

> }

>

>+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)

>+{

>+}

>+

> static struct virtio_transport virtio_transport = {

> 	.transport = {

> 		.module                   = THIS_MODULE,

>@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)

> 		virtio_vsock_tx_done,

> 		virtio_vsock_event_done,

> 	};

>+	vq_callback_t *ex_callbacks[] = {


'ex' is not clear, maybe better 'dgram'?

What happen if F_DGRAM is negotiated, but not F_STREAM?

>+		virtio_vsock_rx_done,

>+		virtio_vsock_tx_done,

>+		virtio_vsock_dgram_rx_done,

>+		virtio_vsock_dgram_tx_done,

>+		virtio_vsock_event_done,

>+	};

>+

> 	static const char * const names[] = {

> 		"rx",

> 		"tx",

> 		"event",

> 	};

>+	static const char * const ex_names[] = {

>+		"rx",

>+		"tx",

>+		"dgram_rx",

>+		"dgram_tx",

>+		"event",

>+	};

>+

> 	struct virtio_vsock *vsock = NULL;

>-	int ret;

>+	int ret, max_vq;

>

> 	ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);

> 	if (ret)

>@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)

>

> 	vsock->vdev = vdev;

>

>-	ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,

>+	if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))

>+		vsock->has_dgram = true;

>+

>+	if (vsock->has_dgram)

>+		max_vq = VSOCK_VQ_EX_MAX;

>+	else

>+		max_vq = VSOCK_VQ_MAX;

>+

>+	vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);

>+	if (!vsock->vqs) {

>+		ret = -ENOMEM;

>+		goto out;

>+	}

>+

>+	if (vsock->has_dgram) {

>+		ret = virtio_find_vqs(vsock->vdev, max_vq,

>+			      vsock->vqs, ex_callbacks, ex_names,

>+			      NULL);

>+	} else {

>+		ret = virtio_find_vqs(vsock->vdev, max_vq,

> 			      vsock->vqs, callbacks, names,

> 			      NULL);

>+	}

>+

> 	if (ret < 0)

> 		goto out;

>

>@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {

> };

>

> static unsigned int features[] = {

>+	VIRTIO_VSOCK_F_DGRAM,

> };

>

> static struct virtio_driver virtio_vsock_driver = {

>-- 

>2.11.0

>
Stefano Garzarella June 18, 2021, 9:54 a.m. UTC | #10
On Wed, Jun 09, 2021 at 11:24:57PM +0000, Jiang Wang wrote:
>Also change number of vqs according to the config

>

>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>

>---

> drivers/vhost/Kconfig |  8 ++++++++

> drivers/vhost/vsock.c | 11 ++++++++---

> 2 files changed, 16 insertions(+), 3 deletions(-)


As we already discussed, I think we don't need this patch.

Thanks,
Stefano
Stefano Garzarella June 18, 2021, 10:04 a.m. UTC | #11
On Wed, Jun 09, 2021 at 11:24:58PM +0000, Jiang Wang wrote:
>Make rx buf len configurable via sysfs

>

>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>

>---

> net/vmw_vsock/virtio_transport.c | 37 +++++++++++++++++++++++++++++++++++--

> 1 file changed, 35 insertions(+), 2 deletions(-)

>

>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c

>index cf47aadb0c34..2e4dd9c48472 100644

>--- a/net/vmw_vsock/virtio_transport.c

>+++ b/net/vmw_vsock/virtio_transport.c

>@@ -29,6 +29,14 @@ static struct virtio_vsock __rcu *the_virtio_vsock;

> static struct virtio_vsock *the_virtio_vsock_dgram;

> static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */

>

>+static int rx_buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;

>+static struct kobject *kobj_ref;

>+static ssize_t  sysfs_show(struct kobject *kobj,

>+			struct kobj_attribute *attr, char *buf);

>+static ssize_t  sysfs_store(struct kobject *kobj,

>+			struct kobj_attribute *attr, const char *buf, size_t count);

>+static struct kobj_attribute rxbuf_attr = __ATTR(rx_buf_value, 0660, sysfs_show, sysfs_store);


Maybe better to use a 'dgram' prefix.

>+

> struct virtio_vsock {

> 	struct virtio_device *vdev;

> 	struct virtqueue **vqs;

>@@ -360,7 +368,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)

>

> static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)

> {

>-	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;

>+	int buf_len = rx_buf_len;

> 	struct virtio_vsock_pkt *pkt;

> 	struct scatterlist hdr, buf, *sgs[2];

> 	struct virtqueue *vq;

>@@ -1003,6 +1011,22 @@ static struct virtio_driver virtio_vsock_driver = {

> 	.remove = virtio_vsock_remove,

> };

>

>+static ssize_t sysfs_show(struct kobject *kobj,

>+		struct kobj_attribute *attr, char *buf)

>+{

>+	return sprintf(buf, "%d", rx_buf_len);

>+}

>+

>+static ssize_t sysfs_store(struct kobject *kobj,

>+		struct kobj_attribute *attr, const char *buf, size_t count)

>+{

>+	if (kstrtou32(buf, 0, &rx_buf_len) < 0)

>+		return -EINVAL;

>+	if (rx_buf_len < 1024)

>+		rx_buf_len = 1024;

>+	return count;

>+}

>+

> static int __init virtio_vsock_init(void)

> {

> 	int ret;

>@@ -1020,8 +1044,17 @@ static int __init virtio_vsock_init(void)

> 	if (ret)

> 		goto out_vci;

>

>-	return 0;

>+	kobj_ref = kobject_create_and_add("vsock", kernel_kobj);


So, IIUC, the path will be /sys/vsock/rx_buf_value?

I'm not sure if we need to add a `virtio` subdir (e.g.
/sys/vsock/virtio/dgram_rx_buf_size)

Thanks,
Stefano

>

>+	/*Creating sysfs file for etx_value*/

>+	ret = sysfs_create_file(kobj_ref, &rxbuf_attr.attr);

>+	if (ret)

>+		goto out_sysfs;

>+

>+	return 0;

>+out_sysfs:

>+	kobject_put(kobj_ref);

>+	sysfs_remove_file(kernel_kobj, &rxbuf_attr.attr);

> out_vci:

> 	vsock_core_unregister(&virtio_transport.transport);

> out_wq:

>-- 

>2.11.0

>
Stefano Garzarella June 18, 2021, 10:13 a.m. UTC | #12
We should use le16_to_cpu when accessing pkt->hdr fields.

On Wed, Jun 09, 2021 at 11:24:55PM +0000, Jiang Wang wrote:
>This patch supports dgram on vhost side, including

>tx and rx. The vhost send packets asynchronously.

>

>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>

>---

> drivers/vhost/vsock.c | 199 +++++++++++++++++++++++++++++++++++++++++++-------

> 1 file changed, 173 insertions(+), 26 deletions(-)

>

>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c

>index 81d064601093..d366463be6d4 100644

>--- a/drivers/vhost/vsock.c

>+++ b/drivers/vhost/vsock.c

>@@ -28,7 +28,10 @@

>  * small pkts.

>  */

> #define VHOST_VSOCK_PKT_WEIGHT 256

>+#define VHOST_VSOCK_DGRM_MAX_PENDING_PKT 128

>

>+/* Max wait time in busy poll in microseconds */

>+#define VHOST_VSOCK_BUSY_POLL_TIMEOUT 20

> enum {

> 	VHOST_VSOCK_FEATURES = VHOST_FEATURES |

> 			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |

>@@ -45,7 +48,7 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);

>

> struct vhost_vsock {

> 	struct vhost_dev dev;

>-	struct vhost_virtqueue vqs[2];

>+	struct vhost_virtqueue vqs[4];

>

> 	/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */

> 	struct hlist_node hash;

>@@ -54,6 +57,11 @@ struct vhost_vsock {

> 	spinlock_t send_pkt_list_lock;

> 	struct list_head send_pkt_list;	/* host->guest pending packets */

>

>+	spinlock_t dgram_send_pkt_list_lock;

>+	struct list_head dgram_send_pkt_list;	/* host->guest pending packets */

>+	struct vhost_work dgram_send_pkt_work;

>+	int  dgram_used; /*pending packets to be send */

>+

> 	atomic_t queued_replies;

>

> 	u32 guest_cid;

>@@ -90,10 +98,22 @@ static void

> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> 			    struct vhost_virtqueue *vq)

> {

>-	struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];

>+	struct vhost_virtqueue *tx_vq;

> 	int pkts = 0, total_len = 0;

> 	bool added = false;

> 	bool restart_tx = false;

>+	spinlock_t *lock;

>+	struct list_head *send_pkt_list;

>+

>+	if (vq == &vsock->vqs[VSOCK_VQ_RX]) {

>+		tx_vq = &vsock->vqs[VSOCK_VQ_TX];

>+		lock = &vsock->send_pkt_list_lock;

>+		send_pkt_list = &vsock->send_pkt_list;

>+	} else {

>+		tx_vq = &vsock->vqs[VSOCK_VQ_DGRAM_TX];

>+		lock = &vsock->dgram_send_pkt_list_lock;

>+		send_pkt_list = &vsock->dgram_send_pkt_list;

>+	}

>

> 	mutex_lock(&vq->mutex);

>

>@@ -113,36 +133,48 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> 		size_t nbytes;

> 		size_t iov_len, payload_len;

> 		int head;

>+		bool is_dgram = false;

>

>-		spin_lock_bh(&vsock->send_pkt_list_lock);

>-		if (list_empty(&vsock->send_pkt_list)) {

>-			spin_unlock_bh(&vsock->send_pkt_list_lock);

>+		spin_lock_bh(lock);

>+		if (list_empty(send_pkt_list)) {

>+			spin_unlock_bh(lock);

> 			vhost_enable_notify(&vsock->dev, vq);

> 			break;

> 		}

>

>-		pkt = list_first_entry(&vsock->send_pkt_list,

>+		pkt = list_first_entry(send_pkt_list,

> 				       struct virtio_vsock_pkt, list);

> 		list_del_init(&pkt->list);

>-		spin_unlock_bh(&vsock->send_pkt_list_lock);

>+		spin_unlock_bh(lock);

>+

>+		if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)

                     ^
                     le16_to_cpu(pkt->hdr.type)

>+			is_dgram = true;

>

> 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),

> 					 &out, &in, NULL, NULL);

> 		if (head < 0) {

>-			spin_lock_bh(&vsock->send_pkt_list_lock);

>-			list_add(&pkt->list, &vsock->send_pkt_list);

>-			spin_unlock_bh(&vsock->send_pkt_list_lock);

>+			spin_lock_bh(lock);

>+			list_add(&pkt->list, send_pkt_list);

>+			spin_unlock_bh(lock);

> 			break;

> 		}

>

> 		if (head == vq->num) {

>-			spin_lock_bh(&vsock->send_pkt_list_lock);

>-			list_add(&pkt->list, &vsock->send_pkt_list);

>-			spin_unlock_bh(&vsock->send_pkt_list_lock);

>+			if (is_dgram) {

>+				virtio_transport_free_pkt(pkt);

>+				vq_err(vq, "Dgram virtqueue is full!");

>+				spin_lock_bh(lock);

>+				vsock->dgram_used--;

>+				spin_unlock_bh(lock);

>+				break;

>+			}

>+			spin_lock_bh(lock);

>+			list_add(&pkt->list, send_pkt_list);

>+			spin_unlock_bh(lock);

>

> 			/* We cannot finish yet if more buffers snuck in while

>-			 * re-enabling notify.

>-			 */

>+			* re-enabling notify.

>+			*/

> 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {

> 				vhost_disable_notify(&vsock->dev, vq);

> 				continue;

>@@ -153,6 +185,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> 		if (out) {

> 			virtio_transport_free_pkt(pkt);

> 			vq_err(vq, "Expected 0 output buffers, got %u\n", out);

>+			if (is_dgram) {

>+				spin_lock_bh(lock);

>+				vsock->dgram_used--;

>+				spin_unlock_bh(lock);

>+			}

>+

> 			break;

> 		}

>

>@@ -160,6 +198,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> 		if (iov_len < sizeof(pkt->hdr)) {

> 			virtio_transport_free_pkt(pkt);

> 			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);

>+			if (is_dgram) {

>+				spin_lock_bh(lock);

>+				vsock->dgram_used--;

>+				spin_unlock_bh(lock);

>+			}

>+			break;

>+		}

>+

>+		if (iov_len < pkt->len - pkt->off &&

>+			vq == &vsock->vqs[VSOCK_VQ_DGRAM_RX]) {

>+			virtio_transport_free_pkt(pkt);

>+			vq_err(vq, "Buffer len [%zu] too small for dgram\n", iov_len);

> 			break;

> 		}

>

>@@ -179,6 +229,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> 		if (nbytes != sizeof(pkt->hdr)) {

> 			virtio_transport_free_pkt(pkt);

> 			vq_err(vq, "Faulted on copying pkt hdr\n");

>+			if (is_dgram) {

>+				spin_lock_bh(lock);

>+				vsock->dgram_used--;

>+				spin_unlock_bh(lock);

>+			}

> 			break;

> 		}

>

>@@ -204,16 +259,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> 		/* If we didn't send all the payload we can requeue the packet

> 		 * to send it with the next available buffer.

> 		 */

>-		if (pkt->off < pkt->len) {

>+		if ((pkt->off < pkt->len)

>+			&& (vq == &vsock->vqs[VSOCK_VQ_RX])) {

> 			/* We are queueing the same virtio_vsock_pkt to handle

> 			 * the remaining bytes, and we want to deliver it

> 			 * to monitoring devices in the next iteration.

> 			 */

> 			pkt->tap_delivered = false;

>

>-			spin_lock_bh(&vsock->send_pkt_list_lock);

>-			list_add(&pkt->list, &vsock->send_pkt_list);

>-			spin_unlock_bh(&vsock->send_pkt_list_lock);

>+			spin_lock_bh(lock);

>+			list_add(&pkt->list, send_pkt_list);

>+			spin_unlock_bh(lock);

> 		} else {

> 			if (pkt->reply) {

> 				int val;

>@@ -228,6 +284,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> 			}

>

> 			virtio_transport_free_pkt(pkt);

>+			if (is_dgram) {

>+				spin_lock_bh(lock);

>+				vsock->dgram_used--;

>+				spin_unlock_bh(lock);

>+			}

> 		}

> 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));

> 	if (added)

>@@ -251,11 +312,25 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)

> 	vhost_transport_do_send_pkt(vsock, vq);

> }

>

>+static void vhost_transport_dgram_send_pkt_work(struct vhost_work *work)

>+{

>+	struct vhost_virtqueue *vq;

>+	struct vhost_vsock *vsock;

>+

>+	vsock = container_of(work, struct vhost_vsock, dgram_send_pkt_work);

>+	vq = &vsock->vqs[VSOCK_VQ_DGRAM_RX];

>+

>+	vhost_transport_do_send_pkt(vsock, vq);

>+}

>+

> static int

> vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)

> {

> 	struct vhost_vsock *vsock;

> 	int len = pkt->len;

>+	spinlock_t *lock;

>+	struct list_head *send_pkt_list;

>+	struct vhost_work *work;

>

> 	rcu_read_lock();

>

>@@ -267,14 +342,38 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)

> 		return -ENODEV;

> 	}

>

>+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {

             ^
             le16_to_cpu(pkt->hdr.type)
>+		lock = &vsock->send_pkt_list_lock;

>+		send_pkt_list = &vsock->send_pkt_list;

>+		work = &vsock->send_pkt_work;

>+	} else if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {

                    ^
                    le16_to_cpu(pkt->hdr.type)
>+		lock = &vsock->dgram_send_pkt_list_lock;

>+		send_pkt_list = &vsock->dgram_send_pkt_list;

>+		work = &vsock->dgram_send_pkt_work;

>+	} else {

>+		rcu_read_unlock();

>+		virtio_transport_free_pkt(pkt);

>+		return -EINVAL;

>+	}

>+

>+

> 	if (pkt->reply)

> 		atomic_inc(&vsock->queued_replies);

>

>-	spin_lock_bh(&vsock->send_pkt_list_lock);

>-	list_add_tail(&pkt->list, &vsock->send_pkt_list);

>-	spin_unlock_bh(&vsock->send_pkt_list_lock);

>+	spin_lock_bh(lock);

>+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {

             ^
             le16_to_cpu(pkt->hdr.type)
>+		if (vsock->dgram_used  == VHOST_VSOCK_DGRM_MAX_PENDING_PKT)

>+			len = -ENOMEM;

>+		else {

>+			vsock->dgram_used++;

>+			list_add_tail(&pkt->list, send_pkt_list);

>+		}

>+	} else

>+		list_add_tail(&pkt->list, send_pkt_list);

>

>-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);

>+	spin_unlock_bh(lock);

>+

>+	vhost_work_queue(&vsock->dev, work);

>

> 	rcu_read_unlock();

> 	return len;

>@@ -355,7 +454,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,

> 		return NULL;

> 	}

>

>-	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM)

>+	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM

>+		|| le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_DGRAM)

> 		pkt->len = le32_to_cpu(pkt->hdr.len);

>

> 	/* No payload */

>@@ -442,6 +542,18 @@ static struct virtio_transport vhost_transport = {

> 	.send_pkt = vhost_transport_send_pkt,

> };

>

>+static inline unsigned long busy_clock(void)

>+{

>+	return local_clock() >> 10;

>+}

>+

>+static bool vhost_can_busy_poll(unsigned long endtime)

>+{

>+	return likely(!need_resched() && !time_after(busy_clock(), endtime) &&

>+		      !signal_pending(current));

>+}

>+

>+

> static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

> {

> 	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,

>@@ -452,6 +564,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

> 	int head, pkts = 0, total_len = 0;

> 	unsigned int out, in;

> 	bool added = false;

>+	unsigned long busyloop_timeout = VHOST_VSOCK_BUSY_POLL_TIMEOUT;

>+	unsigned long endtime;

>

> 	mutex_lock(&vq->mutex);

>

>@@ -461,11 +575,14 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

> 	if (!vq_meta_prefetch(vq))

> 		goto out;

>

>+	endtime = busy_clock() + busyloop_timeout;

> 	vhost_disable_notify(&vsock->dev, vq);

>+	preempt_disable();

> 	do {

> 		u32 len;

>

>-		if (!vhost_vsock_more_replies(vsock)) {

>+		if (vq == &vsock->vqs[VSOCK_VQ_TX]

>+			&& !vhost_vsock_more_replies(vsock)) {

> 			/* Stop tx until the device processes already

> 			 * pending replies.  Leave tx virtqueue

> 			 * callbacks disabled.

>@@ -479,6 +596,11 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

> 			break;

>

> 		if (head == vq->num) {

>+			if (vhost_can_busy_poll(endtime)) {

>+				cpu_relax();

>+				continue;

>+			}

>+

> 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {

> 				vhost_disable_notify(&vsock->dev, vq);

> 				continue;

>@@ -510,6 +632,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

> 		total_len += len;

> 		added = true;

> 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));

>+	preempt_enable();

>

> no_more_replies:

> 	if (added)

>@@ -565,6 +688,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)

> 	 * let's kick the send worker to send them.

> 	 */

> 	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);

>+	vhost_work_queue(&vsock->dev, &vsock->dgram_send_pkt_work);

>

> 	mutex_unlock(&vsock->dev.mutex);

> 	return 0;

>@@ -639,8 +763,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)

>

> 	vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];

> 	vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];

>+	vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];

>+	vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];

> 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;

> 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;

>+	vsock->vqs[VSOCK_VQ_DGRAM_TX].handle_kick =

>+						vhost_vsock_handle_tx_kick;

>+	vsock->vqs[VSOCK_VQ_DGRAM_RX].handle_kick =

>+						vhost_vsock_handle_rx_kick;

>

> 	vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),

> 		       UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,

>@@ -650,6 +780,11 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)

> 	spin_lock_init(&vsock->send_pkt_list_lock);

> 	INIT_LIST_HEAD(&vsock->send_pkt_list);

> 	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);

>+	spin_lock_init(&vsock->dgram_send_pkt_list_lock);

>+	INIT_LIST_HEAD(&vsock->dgram_send_pkt_list);

>+	vhost_work_init(&vsock->dgram_send_pkt_work,

>+			vhost_transport_dgram_send_pkt_work);

>+

> 	return 0;

>

> out:

>@@ -665,6 +800,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)

> 		if (vsock->vqs[i].handle_kick)

> 			vhost_poll_flush(&vsock->vqs[i].poll);

> 	vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);

>+	vhost_work_flush(&vsock->dev, &vsock->dgram_send_pkt_work);

> }

>

> static void vhost_vsock_reset_orphans(struct sock *sk)

>@@ -724,6 +860,17 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)

> 	}

> 	spin_unlock_bh(&vsock->send_pkt_list_lock);

>

>+	spin_lock_bh(&vsock->dgram_send_pkt_list_lock);

>+	while (!list_empty(&vsock->dgram_send_pkt_list)) {

>+		struct virtio_vsock_pkt *pkt;

>+

>+		pkt = list_first_entry(&vsock->dgram_send_pkt_list,

>+				struct virtio_vsock_pkt, list);

>+		list_del_init(&pkt->list);

>+		virtio_transport_free_pkt(pkt);

>+	}

>+	spin_unlock_bh(&vsock->dgram_send_pkt_list_lock);

>+

> 	vhost_dev_cleanup(&vsock->dev);

> 	kfree(vsock->dev.vqs);

> 	vhost_vsock_free(vsock);

>@@ -906,7 +1053,7 @@ static int __init vhost_vsock_init(void)

> 	int ret;

>

> 	ret = vsock_core_register(&vhost_transport.transport,

>-				  VSOCK_TRANSPORT_F_H2G);

>+				  VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);

> 	if (ret < 0)

> 		return ret;

> 	return misc_register(&vhost_vsock_misc);

>-- 

>2.11.0

>
Jiang Wang June 21, 2021, 5:21 p.m. UTC | #13
On Fri, Jun 18, 2021 at 2:35 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>

> On Wed, Jun 09, 2021 at 11:24:52PM +0000, Jiang Wang wrote:

> >This patchset implements support of SOCK_DGRAM for virtio

> >transport.

> >

> >Datagram sockets are connectionless and unreliable. To avoid unfair contention

> >with stream and other sockets, add two more virtqueues and

> >a new feature bit to indicate if those two new queues exist or not.

> >

> >Dgram does not use the existing credit update mechanism for

> >stream sockets. When sending from the guest/driver, sending packets

> >synchronously, so the sender will get an error when the virtqueue is full.

> >When sending from the host/device, send packets asynchronously

> >because the descriptor memory belongs to the corresponding QEMU

> >process.

> >

> >The virtio spec patch is here:

> >https://www.spinics.net/lists/linux-virtualization/msg50027.html

> >

> >For those who prefer git repo, here is the link for the linux kernel:

> >https://github.com/Jiang1155/linux/tree/vsock-dgram-v1

> >

> >qemu patch link:

> >https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1

> >

> >

> >To do:

> >1. use skb when receiving packets

> >2. support multiple transport

> >3. support mergeable rx buffer

>

> Jiang, I'll do a fast review, but I think is better to rebase on

> net-next since SEQPACKET support is now merged.

>

> Please also run ./scripts/checkpatch.pl, there are a lot of issues.

>

> I'll leave some simple comments in the patches, but I prefer to do a

> deep review after the rebase and the dynamic handling of DGRAM.


Hi Stefano,

Sure. I will rebase and add dynamic handling of DGRAM. I run checkpatch.pl
at some point but I will make sure to run it again before submitting. Thanks.

Regards,

Jiang


> Thanks,

> Stefano

>
Jiang Wang June 21, 2021, 5:24 p.m. UTC | #14
On Fri, Jun 18, 2021 at 2:40 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>

> On Wed, Jun 09, 2021 at 11:24:53PM +0000, Jiang Wang wrote:

> >When this feature is enabled, allocate 5 queues,

> >otherwise, allocate 3 queues to be compatible with

> >old QEMU versions.

> >

> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>

> >---

> > drivers/vhost/vsock.c             |  3 +-

> > include/linux/virtio_vsock.h      |  9 +++++

> > include/uapi/linux/virtio_vsock.h |  3 ++

> > net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----

> > 4 files changed, 80 insertions(+), 8 deletions(-)

> >

> >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c

> >index 5e78fb719602..81d064601093 100644

> >--- a/drivers/vhost/vsock.c

> >+++ b/drivers/vhost/vsock.c

> >@@ -31,7 +31,8 @@

> >

> > enum {

> >       VHOST_VSOCK_FEATURES = VHOST_FEATURES |

> >-                             (1ULL << VIRTIO_F_ACCESS_PLATFORM)

> >+                             (1ULL << VIRTIO_F_ACCESS_PLATFORM) |

> >+                             (1ULL << VIRTIO_VSOCK_F_DGRAM)

> > };

> >

> > enum {

> >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h

> >index dc636b727179..ba3189ed9345 100644

> >--- a/include/linux/virtio_vsock.h

> >+++ b/include/linux/virtio_vsock.h

> >@@ -18,6 +18,15 @@ enum {

> >       VSOCK_VQ_MAX    = 3,

> > };

> >

> >+enum {

> >+      VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */

> >+      VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */

> >+      VSOCK_VQ_DGRAM_RX       = 2,

> >+      VSOCK_VQ_DGRAM_TX       = 3,

> >+      VSOCK_VQ_EX_EVENT       = 4,

> >+      VSOCK_VQ_EX_MAX         = 5,

> >+};

> >+

> > /* Per-socket state (accessed via vsk->trans) */

> > struct virtio_vsock_sock {

> >       struct vsock_sock *vsk;

> >diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h

> >index 1d57ed3d84d2..b56614dff1c9 100644

> >--- a/include/uapi/linux/virtio_vsock.h

> >+++ b/include/uapi/linux/virtio_vsock.h

> >@@ -38,6 +38,9 @@

> > #include <linux/virtio_ids.h>

> > #include <linux/virtio_config.h>

> >

> >+/* The feature bitmap for virtio net */

> >+#define VIRTIO_VSOCK_F_DGRAM  0       /* Host support dgram vsock */

> >+

> > struct virtio_vsock_config {

> >       __le64 guest_cid;

> > } __attribute__((packed));

> >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c

> >index 2700a63ab095..7dcb8db23305 100644

> >--- a/net/vmw_vsock/virtio_transport.c

> >+++ b/net/vmw_vsock/virtio_transport.c

> >@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */

> >

> > struct virtio_vsock {

> >       struct virtio_device *vdev;

> >-      struct virtqueue *vqs[VSOCK_VQ_MAX];

> >+      struct virtqueue **vqs;

> >+      bool has_dgram;

> >

> >       /* Virtqueue processing is deferred to a workqueue */

> >       struct work_struct tx_work;

> >@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,

> >       struct scatterlist sg;

> >       struct virtqueue *vq;

> >

> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];

> >+      if (vsock->has_dgram)

> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];

> >+      else

> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];

> >

> >       sg_init_one(&sg, event, sizeof(*event));

> >

> >@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)

> >               virtio_vsock_event_fill_one(vsock, event);

> >       }

> >

> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

> >+      if (vsock->has_dgram)

> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);

> >+      else

> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

> > }

> >

> > static void virtio_vsock_reset_sock(struct sock *sk)

> >@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)

> >               container_of(work, struct virtio_vsock, event_work);

> >       struct virtqueue *vq;

> >

> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];

> >+      if (vsock->has_dgram)

> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];

> >+      else

> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];

> >

> >       mutex_lock(&vsock->event_lock);

> >

> >@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)

> >               }

> >       } while (!virtqueue_enable_cb(vq));

> >

> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

> >+      if (vsock->has_dgram)

> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);

> >+      else

> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

> > out:

> >       mutex_unlock(&vsock->event_lock);

> > }

> >@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)

> >       queue_work(virtio_vsock_workqueue, &vsock->tx_work);

> > }

> >

> >+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)

> >+{

> >+}

> >+

> > static void virtio_vsock_rx_done(struct virtqueue *vq)

> > {

> >       struct virtio_vsock *vsock = vq->vdev->priv;

> >@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)

> >       queue_work(virtio_vsock_workqueue, &vsock->rx_work);

> > }

> >

> >+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)

> >+{

> >+}

> >+

> > static struct virtio_transport virtio_transport = {

> >       .transport = {

> >               .module                   = THIS_MODULE,

> >@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)

> >               virtio_vsock_tx_done,

> >               virtio_vsock_event_done,

> >       };

> >+      vq_callback_t *ex_callbacks[] = {

>

> 'ex' is not clear, maybe better 'dgram'?

>

sure.

> What happen if F_DGRAM is negotiated, but not F_STREAM?

>

Hmm. In my mind, F_STREAM is always negotiated. Do we want to add
support when F_STREAM is not negotiated?

> >+              virtio_vsock_rx_done,

> >+              virtio_vsock_tx_done,

> >+              virtio_vsock_dgram_rx_done,

> >+              virtio_vsock_dgram_tx_done,

> >+              virtio_vsock_event_done,

> >+      };

> >+

> >       static const char * const names[] = {

> >               "rx",

> >               "tx",

> >               "event",

> >       };

> >+      static const char * const ex_names[] = {

> >+              "rx",

> >+              "tx",

> >+              "dgram_rx",

> >+              "dgram_tx",

> >+              "event",

> >+      };

> >+

> >       struct virtio_vsock *vsock = NULL;

> >-      int ret;

> >+      int ret, max_vq;

> >

> >       ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);

> >       if (ret)

> >@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)

> >

> >       vsock->vdev = vdev;

> >

> >-      ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,

> >+      if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))

> >+              vsock->has_dgram = true;

> >+

> >+      if (vsock->has_dgram)

> >+              max_vq = VSOCK_VQ_EX_MAX;

> >+      else

> >+              max_vq = VSOCK_VQ_MAX;

> >+

> >+      vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);

> >+      if (!vsock->vqs) {

> >+              ret = -ENOMEM;

> >+              goto out;

> >+      }

> >+

> >+      if (vsock->has_dgram) {

> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,

> >+                            vsock->vqs, ex_callbacks, ex_names,

> >+                            NULL);

> >+      } else {

> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,

> >                             vsock->vqs, callbacks, names,

> >                             NULL);

> >+      }

> >+

> >       if (ret < 0)

> >               goto out;

> >

> >@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {

> > };

> >

> > static unsigned int features[] = {

> >+      VIRTIO_VSOCK_F_DGRAM,

> > };

> >

> > static struct virtio_driver virtio_vsock_driver = {

> >--

> >2.11.0

> >

>
Jiang Wang June 21, 2021, 5:25 p.m. UTC | #15
On Fri, Jun 18, 2021 at 2:54 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>

> On Wed, Jun 09, 2021 at 11:24:57PM +0000, Jiang Wang wrote:

> >Also change number of vqs according to the config

> >

> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>

> >---

> > drivers/vhost/Kconfig |  8 ++++++++

> > drivers/vhost/vsock.c | 11 ++++++++---

> > 2 files changed, 16 insertions(+), 3 deletions(-)

>

> As we already discussed, I think we don't need this patch.


Sure. will do

> Thanks,

> Stefano

>
Jiang Wang June 21, 2021, 5:27 p.m. UTC | #16
On Fri, Jun 18, 2021 at 3:04 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>

> On Wed, Jun 09, 2021 at 11:24:58PM +0000, Jiang Wang wrote:

> >Make rx buf len configurable via sysfs

> >

> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>

> >---

> > net/vmw_vsock/virtio_transport.c | 37 +++++++++++++++++++++++++++++++++++--

> > 1 file changed, 35 insertions(+), 2 deletions(-)

> >

> >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c

> >index cf47aadb0c34..2e4dd9c48472 100644

> >--- a/net/vmw_vsock/virtio_transport.c

> >+++ b/net/vmw_vsock/virtio_transport.c

> >@@ -29,6 +29,14 @@ static struct virtio_vsock __rcu *the_virtio_vsock;

> > static struct virtio_vsock *the_virtio_vsock_dgram;

> > static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */

> >

> >+static int rx_buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;

> >+static struct kobject *kobj_ref;

> >+static ssize_t  sysfs_show(struct kobject *kobj,

> >+                      struct kobj_attribute *attr, char *buf);

> >+static ssize_t  sysfs_store(struct kobject *kobj,

> >+                      struct kobj_attribute *attr, const char *buf, size_t count);

> >+static struct kobj_attribute rxbuf_attr = __ATTR(rx_buf_value, 0660, sysfs_show, sysfs_store);

>

> Maybe better to use a 'dgram' prefix.


Sure.

> >+

> > struct virtio_vsock {

> >       struct virtio_device *vdev;

> >       struct virtqueue **vqs;

> >@@ -360,7 +368,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)

> >

> > static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)

> > {

> >-      int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;

> >+      int buf_len = rx_buf_len;

> >       struct virtio_vsock_pkt *pkt;

> >       struct scatterlist hdr, buf, *sgs[2];

> >       struct virtqueue *vq;

> >@@ -1003,6 +1011,22 @@ static struct virtio_driver virtio_vsock_driver = {

> >       .remove = virtio_vsock_remove,

> > };

> >

> >+static ssize_t sysfs_show(struct kobject *kobj,

> >+              struct kobj_attribute *attr, char *buf)

> >+{

> >+      return sprintf(buf, "%d", rx_buf_len);

> >+}

> >+

> >+static ssize_t sysfs_store(struct kobject *kobj,

> >+              struct kobj_attribute *attr, const char *buf, size_t count)

> >+{

> >+      if (kstrtou32(buf, 0, &rx_buf_len) < 0)

> >+              return -EINVAL;

> >+      if (rx_buf_len < 1024)

> >+              rx_buf_len = 1024;

> >+      return count;

> >+}

> >+

> > static int __init virtio_vsock_init(void)

> > {

> >       int ret;

> >@@ -1020,8 +1044,17 @@ static int __init virtio_vsock_init(void)

> >       if (ret)

> >               goto out_vci;

> >

> >-      return 0;

> >+      kobj_ref = kobject_create_and_add("vsock", kernel_kobj);

>

> So, IIUC, the path will be /sys/vsock/rx_buf_value?

>

> I'm not sure if we need to add a `virtio` subdir (e.g.

> /sys/vsock/virtio/dgram_rx_buf_size)


I agree adding a virtio is better in case vmware or hyperv will
also have some settings.

> Thanks,

> Stefano

>

> >

> >+      /*Creating sysfs file for etx_value*/

> >+      ret = sysfs_create_file(kobj_ref, &rxbuf_attr.attr);

> >+      if (ret)

> >+              goto out_sysfs;

> >+

> >+      return 0;

> >+out_sysfs:

> >+      kobject_put(kobj_ref);

> >+      sysfs_remove_file(kernel_kobj, &rxbuf_attr.attr);

> > out_vci:

> >       vsock_core_unregister(&virtio_transport.transport);

> > out_wq:

> >--

> >2.11.0

> >

>
Jiang Wang June 21, 2021, 5:32 p.m. UTC | #17
On Fri, Jun 18, 2021 at 3:14 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>

> We should use le16_to_cpu when accessing pkt->hdr fields.


OK. Will do.

> On Wed, Jun 09, 2021 at 11:24:55PM +0000, Jiang Wang wrote:

> >This patch supports dgram on vhost side, including

> >tx and rx. The vhost send packets asynchronously.

> >

> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>

> >---

> > drivers/vhost/vsock.c | 199 +++++++++++++++++++++++++++++++++++++++++++-------

> > 1 file changed, 173 insertions(+), 26 deletions(-)

> >

> >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c

> >index 81d064601093..d366463be6d4 100644

> >--- a/drivers/vhost/vsock.c

> >+++ b/drivers/vhost/vsock.c

> >@@ -28,7 +28,10 @@

> >  * small pkts.

> >  */

> > #define VHOST_VSOCK_PKT_WEIGHT 256

> >+#define VHOST_VSOCK_DGRM_MAX_PENDING_PKT 128

> >

> >+/* Max wait time in busy poll in microseconds */

> >+#define VHOST_VSOCK_BUSY_POLL_TIMEOUT 20

> > enum {

> >       VHOST_VSOCK_FEATURES = VHOST_FEATURES |

> >                              (1ULL << VIRTIO_F_ACCESS_PLATFORM) |

> >@@ -45,7 +48,7 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);

> >

> > struct vhost_vsock {

> >       struct vhost_dev dev;

> >-      struct vhost_virtqueue vqs[2];

> >+      struct vhost_virtqueue vqs[4];

> >

> >       /* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */

> >       struct hlist_node hash;

> >@@ -54,6 +57,11 @@ struct vhost_vsock {

> >       spinlock_t send_pkt_list_lock;

> >       struct list_head send_pkt_list; /* host->guest pending packets */

> >

> >+      spinlock_t dgram_send_pkt_list_lock;

> >+      struct list_head dgram_send_pkt_list;   /* host->guest pending packets */

> >+      struct vhost_work dgram_send_pkt_work;

> >+      int  dgram_used; /*pending packets to be send */

> >+

> >       atomic_t queued_replies;

> >

> >       u32 guest_cid;

> >@@ -90,10 +98,22 @@ static void

> > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> >                           struct vhost_virtqueue *vq)

> > {

> >-      struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];

> >+      struct vhost_virtqueue *tx_vq;

> >       int pkts = 0, total_len = 0;

> >       bool added = false;

> >       bool restart_tx = false;

> >+      spinlock_t *lock;

> >+      struct list_head *send_pkt_list;

> >+

> >+      if (vq == &vsock->vqs[VSOCK_VQ_RX]) {

> >+              tx_vq = &vsock->vqs[VSOCK_VQ_TX];

> >+              lock = &vsock->send_pkt_list_lock;

> >+              send_pkt_list = &vsock->send_pkt_list;

> >+      } else {

> >+              tx_vq = &vsock->vqs[VSOCK_VQ_DGRAM_TX];

> >+              lock = &vsock->dgram_send_pkt_list_lock;

> >+              send_pkt_list = &vsock->dgram_send_pkt_list;

> >+      }

> >

> >       mutex_lock(&vq->mutex);

> >

> >@@ -113,36 +133,48 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> >               size_t nbytes;

> >               size_t iov_len, payload_len;

> >               int head;

> >+              bool is_dgram = false;

> >

> >-              spin_lock_bh(&vsock->send_pkt_list_lock);

> >-              if (list_empty(&vsock->send_pkt_list)) {

> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);

> >+              spin_lock_bh(lock);

> >+              if (list_empty(send_pkt_list)) {

> >+                      spin_unlock_bh(lock);

> >                       vhost_enable_notify(&vsock->dev, vq);

> >                       break;

> >               }

> >

> >-              pkt = list_first_entry(&vsock->send_pkt_list,

> >+              pkt = list_first_entry(send_pkt_list,

> >                                      struct virtio_vsock_pkt, list);

> >               list_del_init(&pkt->list);

> >-              spin_unlock_bh(&vsock->send_pkt_list_lock);

> >+              spin_unlock_bh(lock);

> >+

> >+              if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)

>                      ^

>                      le16_to_cpu(pkt->hdr.type)

>

> >+                      is_dgram = true;

> >

> >               head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),

> >                                        &out, &in, NULL, NULL);

> >               if (head < 0) {

> >-                      spin_lock_bh(&vsock->send_pkt_list_lock);

> >-                      list_add(&pkt->list, &vsock->send_pkt_list);

> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);

> >+                      spin_lock_bh(lock);

> >+                      list_add(&pkt->list, send_pkt_list);

> >+                      spin_unlock_bh(lock);

> >                       break;

> >               }

> >

> >               if (head == vq->num) {

> >-                      spin_lock_bh(&vsock->send_pkt_list_lock);

> >-                      list_add(&pkt->list, &vsock->send_pkt_list);

> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);

> >+                      if (is_dgram) {

> >+                              virtio_transport_free_pkt(pkt);

> >+                              vq_err(vq, "Dgram virtqueue is full!");

> >+                              spin_lock_bh(lock);

> >+                              vsock->dgram_used--;

> >+                              spin_unlock_bh(lock);

> >+                              break;

> >+                      }

> >+                      spin_lock_bh(lock);

> >+                      list_add(&pkt->list, send_pkt_list);

> >+                      spin_unlock_bh(lock);

> >

> >                       /* We cannot finish yet if more buffers snuck in while

> >-                       * re-enabling notify.

> >-                       */

> >+                      * re-enabling notify.

> >+                      */

> >                       if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {

> >                               vhost_disable_notify(&vsock->dev, vq);

> >                               continue;

> >@@ -153,6 +185,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> >               if (out) {

> >                       virtio_transport_free_pkt(pkt);

> >                       vq_err(vq, "Expected 0 output buffers, got %u\n", out);

> >+                      if (is_dgram) {

> >+                              spin_lock_bh(lock);

> >+                              vsock->dgram_used--;

> >+                              spin_unlock_bh(lock);

> >+                      }

> >+

> >                       break;

> >               }

> >

> >@@ -160,6 +198,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> >               if (iov_len < sizeof(pkt->hdr)) {

> >                       virtio_transport_free_pkt(pkt);

> >                       vq_err(vq, "Buffer len [%zu] too small\n", iov_len);

> >+                      if (is_dgram) {

> >+                              spin_lock_bh(lock);

> >+                              vsock->dgram_used--;

> >+                              spin_unlock_bh(lock);

> >+                      }

> >+                      break;

> >+              }

> >+

> >+              if (iov_len < pkt->len - pkt->off &&

> >+                      vq == &vsock->vqs[VSOCK_VQ_DGRAM_RX]) {

> >+                      virtio_transport_free_pkt(pkt);

> >+                      vq_err(vq, "Buffer len [%zu] too small for dgram\n", iov_len);

> >                       break;

> >               }

> >

> >@@ -179,6 +229,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> >               if (nbytes != sizeof(pkt->hdr)) {

> >                       virtio_transport_free_pkt(pkt);

> >                       vq_err(vq, "Faulted on copying pkt hdr\n");

> >+                      if (is_dgram) {

> >+                              spin_lock_bh(lock);

> >+                              vsock->dgram_used--;

> >+                              spin_unlock_bh(lock);

> >+                      }

> >                       break;

> >               }

> >

> >@@ -204,16 +259,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> >               /* If we didn't send all the payload we can requeue the packet

> >                * to send it with the next available buffer.

> >                */

> >-              if (pkt->off < pkt->len) {

> >+              if ((pkt->off < pkt->len)

> >+                      && (vq == &vsock->vqs[VSOCK_VQ_RX])) {

> >                       /* We are queueing the same virtio_vsock_pkt to handle

> >                        * the remaining bytes, and we want to deliver it

> >                        * to monitoring devices in the next iteration.

> >                        */

> >                       pkt->tap_delivered = false;

> >

> >-                      spin_lock_bh(&vsock->send_pkt_list_lock);

> >-                      list_add(&pkt->list, &vsock->send_pkt_list);

> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);

> >+                      spin_lock_bh(lock);

> >+                      list_add(&pkt->list, send_pkt_list);

> >+                      spin_unlock_bh(lock);

> >               } else {

> >                       if (pkt->reply) {

> >                               int val;

> >@@ -228,6 +284,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

> >                       }

> >

> >                       virtio_transport_free_pkt(pkt);

> >+                      if (is_dgram) {

> >+                              spin_lock_bh(lock);

> >+                              vsock->dgram_used--;

> >+                              spin_unlock_bh(lock);

> >+                      }

> >               }

> >       } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));

> >       if (added)

> >@@ -251,11 +312,25 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)

> >       vhost_transport_do_send_pkt(vsock, vq);

> > }

> >

> >+static void vhost_transport_dgram_send_pkt_work(struct vhost_work *work)

> >+{

> >+      struct vhost_virtqueue *vq;

> >+      struct vhost_vsock *vsock;

> >+

> >+      vsock = container_of(work, struct vhost_vsock, dgram_send_pkt_work);

> >+      vq = &vsock->vqs[VSOCK_VQ_DGRAM_RX];

> >+

> >+      vhost_transport_do_send_pkt(vsock, vq);

> >+}

> >+

> > static int

> > vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)

> > {

> >       struct vhost_vsock *vsock;

> >       int len = pkt->len;

> >+      spinlock_t *lock;

> >+      struct list_head *send_pkt_list;

> >+      struct vhost_work *work;

> >

> >       rcu_read_lock();

> >

> >@@ -267,14 +342,38 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)

> >               return -ENODEV;

> >       }

> >

> >+      if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {

>              ^

>              le16_to_cpu(pkt->hdr.type)

> >+              lock = &vsock->send_pkt_list_lock;

> >+              send_pkt_list = &vsock->send_pkt_list;

> >+              work = &vsock->send_pkt_work;

> >+      } else if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {

>                     ^

>                     le16_to_cpu(pkt->hdr.type)

> >+              lock = &vsock->dgram_send_pkt_list_lock;

> >+              send_pkt_list = &vsock->dgram_send_pkt_list;

> >+              work = &vsock->dgram_send_pkt_work;

> >+      } else {

> >+              rcu_read_unlock();

> >+              virtio_transport_free_pkt(pkt);

> >+              return -EINVAL;

> >+      }

> >+

> >+

> >       if (pkt->reply)

> >               atomic_inc(&vsock->queued_replies);

> >

> >-      spin_lock_bh(&vsock->send_pkt_list_lock);

> >-      list_add_tail(&pkt->list, &vsock->send_pkt_list);

> >-      spin_unlock_bh(&vsock->send_pkt_list_lock);

> >+      spin_lock_bh(lock);

> >+      if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {

>              ^

>              le16_to_cpu(pkt->hdr.type)

> >+              if (vsock->dgram_used  == VHOST_VSOCK_DGRM_MAX_PENDING_PKT)

> >+                      len = -ENOMEM;

> >+              else {

> >+                      vsock->dgram_used++;

> >+                      list_add_tail(&pkt->list, send_pkt_list);

> >+              }

> >+      } else

> >+              list_add_tail(&pkt->list, send_pkt_list);

> >

> >-      vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);

> >+      spin_unlock_bh(lock);

> >+

> >+      vhost_work_queue(&vsock->dev, work);

> >

> >       rcu_read_unlock();

> >       return len;

> >@@ -355,7 +454,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,

> >               return NULL;

> >       }

> >

> >-      if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM)

> >+      if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM

> >+              || le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_DGRAM)

> >               pkt->len = le32_to_cpu(pkt->hdr.len);

> >

> >       /* No payload */

> >@@ -442,6 +542,18 @@ static struct virtio_transport vhost_transport = {

> >       .send_pkt = vhost_transport_send_pkt,

> > };

> >

> >+static inline unsigned long busy_clock(void)

> >+{

> >+      return local_clock() >> 10;

> >+}

> >+

> >+static bool vhost_can_busy_poll(unsigned long endtime)

> >+{

> >+      return likely(!need_resched() && !time_after(busy_clock(), endtime) &&

> >+                    !signal_pending(current));

> >+}

> >+

> >+

> > static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

> > {

> >       struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,

> >@@ -452,6 +564,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

> >       int head, pkts = 0, total_len = 0;

> >       unsigned int out, in;

> >       bool added = false;

> >+      unsigned long busyloop_timeout = VHOST_VSOCK_BUSY_POLL_TIMEOUT;

> >+      unsigned long endtime;

> >

> >       mutex_lock(&vq->mutex);

> >

> >@@ -461,11 +575,14 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

> >       if (!vq_meta_prefetch(vq))

> >               goto out;

> >

> >+      endtime = busy_clock() + busyloop_timeout;

> >       vhost_disable_notify(&vsock->dev, vq);

> >+      preempt_disable();

> >       do {

> >               u32 len;

> >

> >-              if (!vhost_vsock_more_replies(vsock)) {

> >+              if (vq == &vsock->vqs[VSOCK_VQ_TX]

> >+                      && !vhost_vsock_more_replies(vsock)) {

> >                       /* Stop tx until the device processes already

> >                        * pending replies.  Leave tx virtqueue

> >                        * callbacks disabled.

> >@@ -479,6 +596,11 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

> >                       break;

> >

> >               if (head == vq->num) {

> >+                      if (vhost_can_busy_poll(endtime)) {

> >+                              cpu_relax();

> >+                              continue;

> >+                      }

> >+

> >                       if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {

> >                               vhost_disable_notify(&vsock->dev, vq);

> >                               continue;

> >@@ -510,6 +632,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

> >               total_len += len;

> >               added = true;

> >       } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));

> >+      preempt_enable();

> >

> > no_more_replies:

> >       if (added)

> >@@ -565,6 +688,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)

> >        * let's kick the send worker to send them.

> >        */

> >       vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);

> >+      vhost_work_queue(&vsock->dev, &vsock->dgram_send_pkt_work);

> >

> >       mutex_unlock(&vsock->dev.mutex);

> >       return 0;

> >@@ -639,8 +763,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)

> >

> >       vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];

> >       vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];

> >+      vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];

> >+      vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];

> >       vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;

> >       vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;

> >+      vsock->vqs[VSOCK_VQ_DGRAM_TX].handle_kick =

> >+                                              vhost_vsock_handle_tx_kick;

> >+      vsock->vqs[VSOCK_VQ_DGRAM_RX].handle_kick =

> >+                                              vhost_vsock_handle_rx_kick;

> >

> >       vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),

> >                      UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,

> >@@ -650,6 +780,11 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)

> >       spin_lock_init(&vsock->send_pkt_list_lock);

> >       INIT_LIST_HEAD(&vsock->send_pkt_list);

> >       vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);

> >+      spin_lock_init(&vsock->dgram_send_pkt_list_lock);

> >+      INIT_LIST_HEAD(&vsock->dgram_send_pkt_list);

> >+      vhost_work_init(&vsock->dgram_send_pkt_work,

> >+                      vhost_transport_dgram_send_pkt_work);

> >+

> >       return 0;

> >

> > out:

> >@@ -665,6 +800,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)

> >               if (vsock->vqs[i].handle_kick)

> >                       vhost_poll_flush(&vsock->vqs[i].poll);

> >       vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);

> >+      vhost_work_flush(&vsock->dev, &vsock->dgram_send_pkt_work);

> > }

> >

> > static void vhost_vsock_reset_orphans(struct sock *sk)

> >@@ -724,6 +860,17 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)

> >       }

> >       spin_unlock_bh(&vsock->send_pkt_list_lock);

> >

> >+      spin_lock_bh(&vsock->dgram_send_pkt_list_lock);

> >+      while (!list_empty(&vsock->dgram_send_pkt_list)) {

> >+              struct virtio_vsock_pkt *pkt;

> >+

> >+              pkt = list_first_entry(&vsock->dgram_send_pkt_list,

> >+                              struct virtio_vsock_pkt, list);

> >+              list_del_init(&pkt->list);

> >+              virtio_transport_free_pkt(pkt);

> >+      }

> >+      spin_unlock_bh(&vsock->dgram_send_pkt_list_lock);

> >+

> >       vhost_dev_cleanup(&vsock->dev);

> >       kfree(vsock->dev.vqs);

> >       vhost_vsock_free(vsock);

> >@@ -906,7 +1053,7 @@ static int __init vhost_vsock_init(void)

> >       int ret;

> >

> >       ret = vsock_core_register(&vhost_transport.transport,

> >-                                VSOCK_TRANSPORT_F_H2G);

> >+                                VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);

> >       if (ret < 0)

> >               return ret;

> >       return misc_register(&vhost_vsock_misc);

> >--

> >2.11.0

> >

>
Stefano Garzarella June 22, 2021, 10:50 a.m. UTC | #18
On Mon, Jun 21, 2021 at 10:24:20AM -0700, Jiang Wang . wrote:
>On Fri, Jun 18, 2021 at 2:40 AM Stefano Garzarella <sgarzare@redhat.com> wrote:

>>

>> On Wed, Jun 09, 2021 at 11:24:53PM +0000, Jiang Wang wrote:

>> >When this feature is enabled, allocate 5 queues,

>> >otherwise, allocate 3 queues to be compatible with

>> >old QEMU versions.

>> >

>> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>

>> >---

>> > drivers/vhost/vsock.c             |  3 +-

>> > include/linux/virtio_vsock.h      |  9 +++++

>> > include/uapi/linux/virtio_vsock.h |  3 ++

>> > net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----

>> > 4 files changed, 80 insertions(+), 8 deletions(-)

>> >

>> >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c

>> >index 5e78fb719602..81d064601093 100644

>> >--- a/drivers/vhost/vsock.c

>> >+++ b/drivers/vhost/vsock.c

>> >@@ -31,7 +31,8 @@

>> >

>> > enum {

>> >       VHOST_VSOCK_FEATURES = VHOST_FEATURES |

>> >-                             (1ULL << VIRTIO_F_ACCESS_PLATFORM)

>> >+                             (1ULL << VIRTIO_F_ACCESS_PLATFORM) |

>> >+                             (1ULL << VIRTIO_VSOCK_F_DGRAM)

>> > };

>> >

>> > enum {

>> >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h

>> >index dc636b727179..ba3189ed9345 100644

>> >--- a/include/linux/virtio_vsock.h

>> >+++ b/include/linux/virtio_vsock.h

>> >@@ -18,6 +18,15 @@ enum {

>> >       VSOCK_VQ_MAX    = 3,

>> > };

>> >

>> >+enum {

>> >+      VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */

>> >+      VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */

>> >+      VSOCK_VQ_DGRAM_RX       = 2,

>> >+      VSOCK_VQ_DGRAM_TX       = 3,

>> >+      VSOCK_VQ_EX_EVENT       = 4,

>> >+      VSOCK_VQ_EX_MAX         = 5,

>> >+};

>> >+

>> > /* Per-socket state (accessed via vsk->trans) */

>> > struct virtio_vsock_sock {

>> >       struct vsock_sock *vsk;

>> >diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h

>> >index 1d57ed3d84d2..b56614dff1c9 100644

>> >--- a/include/uapi/linux/virtio_vsock.h

>> >+++ b/include/uapi/linux/virtio_vsock.h

>> >@@ -38,6 +38,9 @@

>> > #include <linux/virtio_ids.h>

>> > #include <linux/virtio_config.h>

>> >

>> >+/* The feature bitmap for virtio net */

>> >+#define VIRTIO_VSOCK_F_DGRAM  0       /* Host support dgram vsock */

>> >+

>> > struct virtio_vsock_config {

>> >       __le64 guest_cid;

>> > } __attribute__((packed));

>> >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c

>> >index 2700a63ab095..7dcb8db23305 100644

>> >--- a/net/vmw_vsock/virtio_transport.c

>> >+++ b/net/vmw_vsock/virtio_transport.c

>> >@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */

>> >

>> > struct virtio_vsock {

>> >       struct virtio_device *vdev;

>> >-      struct virtqueue *vqs[VSOCK_VQ_MAX];

>> >+      struct virtqueue **vqs;

>> >+      bool has_dgram;

>> >

>> >       /* Virtqueue processing is deferred to a workqueue */

>> >       struct work_struct tx_work;

>> >@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,

>> >       struct scatterlist sg;

>> >       struct virtqueue *vq;

>> >

>> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];

>> >+      if (vsock->has_dgram)

>> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];

>> >+      else

>> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];

>> >

>> >       sg_init_one(&sg, event, sizeof(*event));

>> >

>> >@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)

>> >               virtio_vsock_event_fill_one(vsock, event);

>> >       }

>> >

>> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

>> >+      if (vsock->has_dgram)

>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);

>> >+      else

>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

>> > }

>> >

>> > static void virtio_vsock_reset_sock(struct sock *sk)

>> >@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)

>> >               container_of(work, struct virtio_vsock, event_work);

>> >       struct virtqueue *vq;

>> >

>> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];

>> >+      if (vsock->has_dgram)

>> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];

>> >+      else

>> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];

>> >

>> >       mutex_lock(&vsock->event_lock);

>> >

>> >@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)

>> >               }

>> >       } while (!virtqueue_enable_cb(vq));

>> >

>> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

>> >+      if (vsock->has_dgram)

>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);

>> >+      else

>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

>> > out:

>> >       mutex_unlock(&vsock->event_lock);

>> > }

>> >@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)

>> >       queue_work(virtio_vsock_workqueue, &vsock->tx_work);

>> > }

>> >

>> >+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)

>> >+{

>> >+}

>> >+

>> > static void virtio_vsock_rx_done(struct virtqueue *vq)

>> > {

>> >       struct virtio_vsock *vsock = vq->vdev->priv;

>> >@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)

>> >       queue_work(virtio_vsock_workqueue, &vsock->rx_work);

>> > }

>> >

>> >+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)

>> >+{

>> >+}

>> >+

>> > static struct virtio_transport virtio_transport = {

>> >       .transport = {

>> >               .module                   = THIS_MODULE,

>> >@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)

>> >               virtio_vsock_tx_done,

>> >               virtio_vsock_event_done,

>> >       };

>> >+      vq_callback_t *ex_callbacks[] = {

>>

>> 'ex' is not clear, maybe better 'dgram'?

>>

>sure.

>

>> What happen if F_DGRAM is negotiated, but not F_STREAM?

>>

>Hmm. In my mind, F_STREAM is always negotiated. Do we want to add

>support when F_STREAM is not negotiated?

>


Yep, I think we should support this case.

The main purpose of the feature bits is to enable/disable the 
functionality after the negotiation.
Initially we didn't want to introduce it, but then we thought it was 
better because there could be a device for example that wants to support 
only datagram.

Since you're touching this part of the code, it would be very helpful to 
fix the problem now.

But if you think it's too complex, we can do it in a second step.

Thanks,
Stefano

>> >+              virtio_vsock_rx_done,

>> >+              virtio_vsock_tx_done,

>> >+              virtio_vsock_dgram_rx_done,

>> >+              virtio_vsock_dgram_tx_done,

>> >+              virtio_vsock_event_done,

>> >+      };

>> >+

>> >       static const char * const names[] = {

>> >               "rx",

>> >               "tx",

>> >               "event",

>> >       };

>> >+      static const char * const ex_names[] = {

>> >+              "rx",

>> >+              "tx",

>> >+              "dgram_rx",

>> >+              "dgram_tx",

>> >+              "event",

>> >+      };

>> >+

>> >       struct virtio_vsock *vsock = NULL;

>> >-      int ret;

>> >+      int ret, max_vq;

>> >

>> >       ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);

>> >       if (ret)

>> >@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)

>> >

>> >       vsock->vdev = vdev;

>> >

>> >-      ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,

>> >+      if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))

>> >+              vsock->has_dgram = true;

>> >+

>> >+      if (vsock->has_dgram)

>> >+              max_vq = VSOCK_VQ_EX_MAX;

>> >+      else

>> >+              max_vq = VSOCK_VQ_MAX;

>> >+

>> >+      vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);

>> >+      if (!vsock->vqs) {

>> >+              ret = -ENOMEM;

>> >+              goto out;

>> >+      }

>> >+

>> >+      if (vsock->has_dgram) {

>> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,

>> >+                            vsock->vqs, ex_callbacks, ex_names,

>> >+                            NULL);

>> >+      } else {

>> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,

>> >                             vsock->vqs, callbacks, names,

>> >                             NULL);

>> >+      }

>> >+

>> >       if (ret < 0)

>> >               goto out;

>> >

>> >@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {

>> > };

>> >

>> > static unsigned int features[] = {

>> >+      VIRTIO_VSOCK_F_DGRAM,

>> > };

>> >

>> > static struct virtio_driver virtio_vsock_driver = {

>> >--

>> >2.11.0

>> >

>>

>