mbox series

[v6,00/21] Drivers for gunyah hypervisor

Message ID 20221026185846.3983888-1-quic_eberman@quicinc.com
Headers show
Series Drivers for gunyah hypervisor | expand

Message

Elliot Berman Oct. 26, 2022, 6:58 p.m. UTC
Gunyah is a Type-1 hypervisor independent of any
high-level OS kernel, and runs in a higher CPU privilege level. It does
not depend on any lower-privileged OS kernel/code for its core
functionality. This increases its security and can support a much smaller
trusted computing base than a Type-2 hypervisor.

Gunyah is an open source hypervisor. The source repo is available at
https://github.com/quic/gunyah-hypervisor.

The diagram below shows the architecture.

::

         VM A                    VM B
     +-----+ +-----+  | +-----+ +-----+ +-----+
     |     | |     |  | |     | |     | |     |
 EL0 | APP | | APP |  | | APP | | APP | | APP |
     |     | |     |  | |     | |     | |     |
     +-----+ +-----+  | +-----+ +-----+ +-----+
 ---------------------|-------------------------
     +--------------+ | +----------------------+
     |              | | |                      |
 EL1 | Linux Kernel | | |Linux kernel/Other OS |   ...
     |              | | |                      |
     +--------------+ | +----------------------+
 --------hvc/smc------|------hvc/smc------------
     +----------------------------------------+
     |                                        |
 EL2 |            Gunyah Hypervisor           |
     |                                        |
     +----------------------------------------+

Gunyah provides these following features.

- Threads and Scheduling: The scheduler schedules virtual CPUs (VCPUs) on
physical CPUs and enables time-sharing of the CPUs.
- Memory Management: Gunyah tracks memory ownership and use of all memory
under its control. Memory partitioning between VMs is a fundamental
security feature.
- Interrupt Virtualization: All interrupts are handled in the hypervisor
and routed to the assigned VM.
- Inter-VM Communication: There are several different mechanisms provided
for communicating between VMs.
- Device Virtualization: Para-virtualization of devices is supported using
inter-VM communication. Low level system features and devices such as
interrupt controllers are supported with emulation where required.

This series adds the basic framework for detecting that Linux is running
under Gunyah as a virtual machine, communication with the Gunyah Resource
Manager, and a basic virtual machine manager capable of launching virtual
machines. In a future series, I'll add more functionality to the VM Manager,
but functionality is kept limited here to reduce the number of patches to
review.

Changes in v6:
 - *Replace gunyah-console with gunyah VM Manager*
 - Move include/asm-generic/gunyah.h into include/linux/gunyah.h
 - s/gunyah_msgq/gh_msgq/
 - Minor tweaks and documentation tidying based on comments from Jiri, Greg, Arnd, Dmitry, and Bagas.

Changes in v5: https://lore.kernel.org/all/20221011000840.289033-1-quic_eberman@quicinc.com/
 - Dropped sysfs nodes
 - Switch from aux bus to Gunyah RM bus for the subdevices
 - Cleaning up RM console

Changes in v4: https://lore.kernel.org/all/20220928195633.2348848-1-quic_eberman@quicinc.com/
 - Tidied up documentation throughout based on questions/feedback received
 - Switched message queue implementation to use mailboxes
 - Renamed "gunyah_device" as "gunyah_resource"

Changes in v3: https://lore.kernel.org/all/20220811214107.1074343-1-quic_eberman@quicinc.com/
 - /Maintained/Supported/ in MAINTAINERS
 - Tidied up documentation throughout based on questions/feedback received
 - Moved hypercalls into arch/arm64/gunyah/; following hyper-v's implementation
 - Drop opaque typedefs
 - Move sysfs nodes under /sys/hypervisor/gunyah/
 - Moved Gunyah console driver to drivers/tty/
 - Reworked gunyah_device design to drop the Gunyah bus.

Changes in v2: https://lore.kernel.org/all/20220801211240.597859-1-quic_eberman@quicinc.com/
 - DT bindings clean up
 - Switch hypercalls to follow SMCCC 

v1: https://lore.kernel.org/all/20220223233729.1571114-1-quic_eberman@quicinc.com/

Elliot Berman (21):
  docs: gunyah: Introduce Gunyah Hypervisor
  dt-bindings: Add binding for gunyah hypervisor
  gunyah: Common types and error codes for Gunyah hypercalls
  arm64: smccc: Include alternative-macros.h
  virt: gunyah: Add hypercalls to identify Gunyah
  virt: gunyah: Identify hypervisor version
  mailbox: Allow direct registration to a channel
  virt: gunyah: msgq: Add hypercalls to send and receive messages
  mailbox: Add Gunyah message queue mailbox
  gunyah: rsc_mgr: Add resource manager RPC core
  gunyah: rsc_mgr: Add subdevices bus
  gunyah: rsc_mgr: Add VM lifecycle RPC
  gunyah: vm_mgr: Introduce basic VM Manager
  gunyah: rsc_mgr: Add RPC for sharing memory
  gunyah: vm_mgr: Add/remove user memory regions
  gunyah: vm_mgr: Add ioctls to support basic non-proxy VM boot
  samples: Add sample userspace Gunyah VM Manager
  gunyah: rsc_mgr: Add platform ops on mem_lend/mem_reclaim
  firmware: qcom_scm: Use fixed width src vm bitmap
  firmware: qcom_scm: Register Gunyah platform ops
  docs: gunyah: Document Gunyah VM Manager

 .../bindings/firmware/gunyah-hypervisor.yaml  |  86 +++
 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 Documentation/virt/gunyah/index.rst           | 115 +++
 Documentation/virt/gunyah/message-queue.rst   |  63 ++
 Documentation/virt/gunyah/vm-manager.rst      |  94 +++
 Documentation/virt/index.rst                  |   1 +
 MAINTAINERS                                   |  13 +
 arch/arm64/Kbuild                             |   1 +
 arch/arm64/gunyah/Makefile                    |   1 +
 arch/arm64/gunyah/gunyah_hypercall.c          | 102 +++
 arch/arm64/include/uapi/asm/gunyah.h          |  17 +
 drivers/firmware/qcom_scm.c                   | 126 +++-
 drivers/mailbox/Kconfig                       |  10 +
 drivers/mailbox/Makefile                      |   2 +
 drivers/mailbox/gunyah-msgq.c                 | 225 ++++++
 drivers/mailbox/mailbox.c                     |  96 ++-
 drivers/misc/fastrpc.c                        |   6 +-
 drivers/net/wireless/ath/ath10k/qmi.c         |   4 +-
 drivers/remoteproc/qcom_q6v5_mss.c            |   8 +-
 drivers/soc/qcom/rmtfs_mem.c                  |   2 +-
 drivers/virt/Kconfig                          |   1 +
 drivers/virt/Makefile                         |   1 +
 drivers/virt/gunyah/Kconfig                   |  35 +
 drivers/virt/gunyah/Makefile                  |   7 +
 drivers/virt/gunyah/gunyah.c                  |  46 ++
 drivers/virt/gunyah/rsc_mgr.c                 | 690 ++++++++++++++++++
 drivers/virt/gunyah/rsc_mgr.h                 | 146 ++++
 drivers/virt/gunyah/rsc_mgr_bus.c             |  83 +++
 drivers/virt/gunyah/rsc_mgr_rpc.c             | 475 ++++++++++++
 drivers/virt/gunyah/vm_mgr.c                  | 288 ++++++++
 drivers/virt/gunyah/vm_mgr.h                  |  52 ++
 drivers/virt/gunyah/vm_mgr_mm.c               | 245 +++++++
 include/linux/arm-smccc.h                     |   1 +
 include/linux/gunyah.h                        | 167 +++++
 include/linux/gunyah_rsc_mgr.h                | 161 ++++
 include/linux/mailbox_client.h                |   1 +
 include/linux/mod_devicetable.h               |   8 +
 include/linux/qcom_scm.h                      |   2 +-
 include/uapi/linux/gunyah.h                   |  53 ++
 samples/Kconfig                               |  10 +
 samples/Makefile                              |   1 +
 samples/gunyah/.gitignore                     |   2 +
 samples/gunyah/Makefile                       |   6 +
 samples/gunyah/gunyah_vmm.c                   | 270 +++++++
 samples/gunyah/sample_vm.dts                  |  69 ++
 scripts/mod/devicetable-offsets.c             |   3 +
 scripts/mod/file2alias.c                      |  10 +
 47 files changed, 3763 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
 create mode 100644 Documentation/virt/gunyah/index.rst
 create mode 100644 Documentation/virt/gunyah/message-queue.rst
 create mode 100644 Documentation/virt/gunyah/vm-manager.rst
 create mode 100644 arch/arm64/gunyah/Makefile
 create mode 100644 arch/arm64/gunyah/gunyah_hypercall.c
 create mode 100644 arch/arm64/include/uapi/asm/gunyah.h
 create mode 100644 drivers/mailbox/gunyah-msgq.c
 create mode 100644 drivers/virt/gunyah/Kconfig
 create mode 100644 drivers/virt/gunyah/Makefile
 create mode 100644 drivers/virt/gunyah/gunyah.c
 create mode 100644 drivers/virt/gunyah/rsc_mgr.c
 create mode 100644 drivers/virt/gunyah/rsc_mgr.h
 create mode 100644 drivers/virt/gunyah/rsc_mgr_bus.c
 create mode 100644 drivers/virt/gunyah/rsc_mgr_rpc.c
 create mode 100644 drivers/virt/gunyah/vm_mgr.c
 create mode 100644 drivers/virt/gunyah/vm_mgr.h
 create mode 100644 drivers/virt/gunyah/vm_mgr_mm.c
 create mode 100644 include/linux/gunyah.h
 create mode 100644 include/linux/gunyah_rsc_mgr.h
 create mode 100644 include/uapi/linux/gunyah.h
 create mode 100644 samples/gunyah/.gitignore
 create mode 100644 samples/gunyah/Makefile
 create mode 100644 samples/gunyah/gunyah_vmm.c
 create mode 100644 samples/gunyah/sample_vm.dts


base-commit: 247f34f7b80357943234f93f247a1ae6b6c3a740

Comments

Elliot Berman Nov. 1, 2022, 3:19 a.m. UTC | #1
Hi Jassi,

On 10/27/2022 7:33 PM, Jassi Brar wrote:
 > On Wed, Oct 26, 2022 at 1:59 PM Elliot Berman 
<quic_eberman@quicinc.com> wrote:
 > .....
 >> +
 >> +        gunyah-resource-mgr@0 {
 >> +            compatible = "gunyah-resource-manager-1-0", 
"gunyah-resource-manager";
 >> +            interrupts = <GIC_SPI 3 IRQ_TYPE_EDGE_RISING>, /* TX 
full IRQ */
 >> +                         <GIC_SPI 4 IRQ_TYPE_EDGE_RISING>; /* RX 
empty IRQ */
 >> +            reg = <0x00000000 0x00000000>, <0x00000000 0x00000001>;
 >> +                  /* TX, RX cap ids */
 >> +        };
 >>
 > All these resources are used only by the mailbox controller driver.
 > So, this should be the mailbox controller node, rather than the
 > mailbox user.> One option is to load gunyah-resource-manager as a 
module that relies
 > on the gunyah-mailbox provider. That would also avoid the "Allow
 > direct registration to a channel" hack patch.

A message queue to another guest VM wouldn't be known at boot time and 
thus couldn't be described on the devicetree. We will need "Allow direct 
registration to a channel" patch anyway to support those message queues. 
I would like to have one consistent mechanism to set up message queues.

- Elliot
Elliot Berman Nov. 1, 2022, 8:35 p.m. UTC | #2
On 11/1/2022 9:23 AM, Jassi Brar wrote:
> On Mon, Oct 31, 2022 at 10:20 PM Elliot Berman <quic_eberman@quicinc.com> wrote:
>>
>> Hi Jassi,
>>
>> On 10/27/2022 7:33 PM, Jassi Brar wrote:
>>   > On Wed, Oct 26, 2022 at 1:59 PM Elliot Berman
>> <quic_eberman@quicinc.com> wrote:
>>   > .....
>>   >> +
>>   >> +        gunyah-resource-mgr@0 {
>>   >> +            compatible = "gunyah-resource-manager-1-0",
>> "gunyah-resource-manager";
>>   >> +            interrupts = <GIC_SPI 3 IRQ_TYPE_EDGE_RISING>, /* TX
>> full IRQ */
>>   >> +                         <GIC_SPI 4 IRQ_TYPE_EDGE_RISING>; /* RX
>> empty IRQ */
>>   >> +            reg = <0x00000000 0x00000000>, <0x00000000 0x00000001>;
>>   >> +                  /* TX, RX cap ids */
>>   >> +        };
>>   >>
>>   > All these resources are used only by the mailbox controller driver.
>>   > So, this should be the mailbox controller node, rather than the
>>   > mailbox user.> One option is to load gunyah-resource-manager as a
>> module that relies
>>   > on the gunyah-mailbox provider. That would also avoid the "Allow
>>   > direct registration to a channel" hack patch.
>>
>> A message queue to another guest VM wouldn't be known at boot time and
>> thus couldn't be described on the devicetree.
>>
> I think you need to implement of_xlate() ... or please tell me what
> exactly you need to specify in the dt.

Dynamically created virtual machines can't be known on the dt, so there 
is nothing to specify in the DT. There couldn't be a devicetree node for 
the message queue client because that client is only exists once the VM 
is created by userspace.

As a more concrete example, there is QRTR (net/qrtr) virtualization 
support which is implemented with Gunyah message queues. Whether a QRTR 
client needs to be for VM is only determined when launching the VM as 
well as which message queue resource the QRTR client should be using. 
Since many VMs could be running on a system, it's not possible to know 
the number of mailbox controllers (i.e. message queues) nor the number 
of mailbox clients (e.g. QRTR) as a static configuration in the DT.

Thanks,
Elliot
Elliot Berman Nov. 2, 2022, 11:23 p.m. UTC | #3
On 11/2/2022 11:24 AM, Jassi Brar wrote:
> On Wed, Nov 2, 2022 at 1:06 PM Elliot Berman <quic_eberman@quicinc.com> wrote:
>>
>> Hi Jassi,
>>
>> On 11/1/2022 7:01 PM, Jassi Brar wrote:
>>> On Tue, Nov 1, 2022 at 7:12 PM Elliot Berman <quic_eberman@quicinc.com> wrote:
>>>>
>>>>
>>>>
>>>> On 11/1/2022 2:58 PM, Jassi Brar wrote:
>>>>> On Tue, Nov 1, 2022 at 3:35 PM Elliot Berman <quic_eberman@quicinc.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/1/2022 9:23 AM, Jassi Brar wrote:
>>>>>>> On Mon, Oct 31, 2022 at 10:20 PM Elliot Berman <quic_eberman@quicinc.com> wrote:
>>>>>>>>
>>>>>>>> Hi Jassi,
>>>>>>>>
>>>>>>>> On 10/27/2022 7:33 PM, Jassi Brar wrote:
>>>>>>>>      > On Wed, Oct 26, 2022 at 1:59 PM Elliot Berman
>>>>>>>> <quic_eberman@quicinc.com> wrote:
>>>>>>>>      > .....
>>>>>>>>      >> +
>>>>>>>>      >> +        gunyah-resource-mgr@0 {
>>>>>>>>      >> +            compatible = "gunyah-resource-manager-1-0",
>>>>>>>> "gunyah-resource-manager";
>>>>>>>>      >> +            interrupts = <GIC_SPI 3 IRQ_TYPE_EDGE_RISING>, /* TX
>>>>>>>> full IRQ */
>>>>>>>>      >> +                         <GIC_SPI 4 IRQ_TYPE_EDGE_RISING>; /* RX
>>>>>>>> empty IRQ */
>>>>>>>>      >> +            reg = <0x00000000 0x00000000>, <0x00000000 0x00000001>;
>>>>>>>>      >> +                  /* TX, RX cap ids */
>>>>>>>>      >> +        };
>>>>>>>>      >>
>>>>>>>>      > All these resources are used only by the mailbox controller driver.
>>>>>>>>      > So, this should be the mailbox controller node, rather than the
>>>>>>>>      > mailbox user.> One option is to load gunyah-resource-manager as a
>>>>>>>> module that relies
>>>>>>>>      > on the gunyah-mailbox provider. That would also avoid the "Allow
>>>>>>>>      > direct registration to a channel" hack patch.
>>>>>>>>
>>>>>>>> A message queue to another guest VM wouldn't be known at boot time and
>>>>>>>> thus couldn't be described on the devicetree.
>>>>>>>>
>>>>>>> I think you need to implement of_xlate() ... or please tell me what
>>>>>>> exactly you need to specify in the dt.
>>>>>>
>>>>>> Dynamically created virtual machines can't be known on the dt, so there
>>>>>> is nothing to specify in the DT. There couldn't be a devicetree node for
>>>>>> the message queue client because that client is only exists once the VM
>>>>>> is created by userspace.
>>>>>>
>>>>> The underlying "physical channel" is the synchronous SMC instruction,
>>>>> which remains 1 irrespective of the number of mailbox instances
>>>>> created.
>>>>
>>>> I disagree that the physical channel is the SMC instruction. Regardless
>>>> though, there are num_online_cpus() "physical channels" with this
>>>> perspective.
>>>>
>>>>> So basically you are sharing one resource among users. Why doesn't the
>>>>> RM request the "smc instruction" channel once and share it among
>>>>> users?
>>>>
>>>> I suppose in this scenario, a single mailbox channel would represent all
>>>> message queues? This would cause Linux to serialize *all* message queue
>>>> hypercalls. Sorry, I can only think negative implications.
>>>>
>>>> Error handling needs to move into clients: if a TX message queue becomes
>>>> full or an RX message queue becomes empty, then we'll need to return
>>>> error back to the client right away. The clients would need to register
>>>> for the RTS/RTR interrupts to know when to send/receive messages and
>>>> have retry error handling. If the mailbox controller retried for the
>>>> clients as currently proposed, then we could get into a scenario where a
>>>> message queue could never be ready to send/receive and thus stuck
>>>> forever trying to process that message. The effect here would be that
>>>> the mailbox controller becomes a wrapper to some SMC instructions that
>>>> aren't related at the SMC instruction level.
>>>>
>>>> A single channel would limit performance of SMP systems because only one
>>>> core could send/receive a message. There is no such limitation for
>>>> message queues to behave like this.
>>>>
>>> This is just an illusion. If Gunyah can handle multiple calls from a
>>> VM parallely, even with the "bind-client-to-channel" hack you can't
>>> make sure different channels run on different cpu cores.  If you are
>>> ok with that, you could simply populate a mailbox controller with N
>>> channels and allocate them in any order the clients ask.
>>
>> I wanted to make sure I understood the ask here completely. On what
>> basis is N chosen? Who would be the mailbox clients?
>>
> A channel structure is cheap, so any number that is not likely to run
> out. Say you have 10 possible users in a VM, set N=16. I know ideally
> it should be precise and flexible but the gain in simplicity makes the
> trade-off very acceptable.

I think I get the direction you are thinking now. N is chosen based off 
of how many clients there might be. One mailbox controller will 
represent all message queues and each channel will be one message queue. 
There are some limitations that might make it more complex to implement 
than having 1 message queue per controller like I have now.

My interpretation is that mailbox controller knows the configuration of 
its channels before being bound to a client. For dynamically created 
message queues, the client would need tell the controller about the 
message queue configuration. I didn't find example where client is 
providing information about a channel to the controller.

  1. need a mechanism to allow the client to provide the 
gunyah_resources for the channel (i.e. the irqs and cap ids).
  2. Still need to have bind-client-to-channel patch since clients 
aren't real devices and so shouldn't be on DT.

Thanks,
Elliot
Elliot Berman Nov. 3, 2022, 10:10 p.m. UTC | #4
On 11/3/2022 2:39 AM, Arnd Bergmann wrote:
> On Wed, Nov 2, 2022, at 19:44, Elliot Berman wrote:
>> On 11/2/2022 12:31 AM, Arnd Bergmann wrote:
>>>> +static long gh_dev_ioctl_create_vm(unsigned long arg)
>>>> +{
>>>> +	struct gunyah_vm *ghvm;
>>>> +	struct file *file;
>>>> +	int fd, err;
>>>> +
>>>> +	/* arg reserved for future use. */
>>>> +	if (arg)
>>>> +		return -EINVAL;
>>>
>>> Do you have something specific in mind here? If 'create'
>>> is the only command you support, and it has no arguments,
>>> it would be easier to do it implicitly during open() and
>>> have each fd opened from /dev/gunyah represent a new VM.
>>>
>>
>> I'd like the argument here to support different types of virtual
>> machines. I want to leave open what "different types" can be in case
>> something new comes up in the future, but immediately "different type"
>> would correspond to a few different authentication mechanisms for
>> virtual machines that Gunyah supports.
>>
>> In this series, I'm only supporting unauthenticated virtual machines
>> because they are the simplest to get up and running from a Linux
>> userspace. When I introduce the other authentication mechanisms, I'll
>> expand much more on how they work, but I'll give quick overview here.
>> Other authentication mechanisms that are currently supported by Gunyah
>> are "protected VM" and, on Qualcomm platforms, "PIL/carveout VMs".
>> Protected VMs are *loosely* similar to the protected firmware design for
>> KVM and intended to support Android virtualization use cases.
>> PIL/carevout VMs are special virtual machines that can run on Qualcomm
>> firmware which authenticate in a way similar to remoteproc firmware (MDT
>> loader).
> 
> Ok, thanks for the background. Having different types of virtual
> machines does mean that you may need some complexity, but I would
> still lean towards using the simpler context management of opening
> the /dev/gunyah device node to get a new context, and then using
> ioctls on each fd to manage that context, instead of going through
> the extra indirection of having a secondary 'open context' command
> that always requires opening two file descriptors.
> 
>>> I'm correct, you can just turn the entire bus/device/driver
>>> structure within your code into simple function calls, where
>>> the main code calls vm_mgr_probe() as an exported function
>>> instead of creating a device.
>>
>> Ack. I can do this, although I am nervous about this snowballing into a
>> situation where I have a mega-module.
>>
>>   > Please stop beating everything in a single module.
>>
>> https://lore.kernel.org/all/250945d2-3940-9830-63e5-beec5f44010b@linaro.org/
> 
> I see you concern, but I wasn't suggesting having everything
> in one module either. There are three common ways of splitting
> things into separate modules:
> 
> - I suggested having the vm_mgr module as a library block that
>    exports a few symbols which get used by the core module. The
>    module doesn't do anything on its own, but loading the core
>    module forces loading the vm_mgr.
> 

Got the idea, I'll do this.

- Elliot

> - Alternatively one can do the opposite, and have symbols
>    exported by the core module, with the vm_mgr module using
>    it. This would make sense if you commonly have the core
>    module loaded on virtual machines that do not need to manage
>    other VMs.
> 
> - The method you have is to have a lower "bus" level that
>    abstracts device providers from consumers, with both sides
>    hooking into the bus. This makes sense for physical buses
>    like PCI or USB where both the host driver and the function
>    driver are unaware of implementation details of the other,
>    but in your case it does not seem like a good fit.
> 
>          Arnd
Elliot Berman Nov. 3, 2022, 10:33 p.m. UTC | #5
On 11/2/2022 5:20 PM, Greg Kroah-Hartman wrote:
> On Wed, Nov 02, 2022 at 11:44:51AM -0700, Elliot Berman wrote:
>>
>>
>> On 11/2/2022 12:31 AM, Arnd Bergmann wrote:
>>> On Wed, Oct 26, 2022, at 20:58, Elliot Berman wrote:
>>>
>>>> +static const struct file_operations gh_vm_fops = {
>>>> +	.unlocked_ioctl = gh_vm_ioctl,
>>>> +	.release = gh_vm_release,
>>>> +	.llseek = noop_llseek,
>>>> +};
>>>
>>> There should be a .compat_ioctl entry here, otherwise it is
>>> impossible to use from 32-bit tasks. If all commands have
>>> arguments passed through a pointer to a properly defined
>>> structure, you can just set it to compat_ptr_ioctl.
>>>
>>
>> Ack.
>>
>>>> +static long gh_dev_ioctl_create_vm(unsigned long arg)
>>>> +{
>>>> +	struct gunyah_vm *ghvm;
>>>> +	struct file *file;
>>>> +	int fd, err;
>>>> +
>>>> +	/* arg reserved for future use. */
>>>> +	if (arg)
>>>> +		return -EINVAL;
>>>
>>> Do you have something specific in mind here? If 'create'
>>> is the only command you support, and it has no arguments,
>>> it would be easier to do it implicitly during open() and
>>> have each fd opened from /dev/gunyah represent a new VM.
>>>
>>
>> I'd like the argument here to support different types of virtual machines. I
>> want to leave open what "different types" can be in case something new comes
>> up in the future, but immediately "different type" would correspond to a few
>> different authentication mechanisms for virtual machines that Gunyah
>> supports.
> 
> Please don't add code that does not actually do something now, as that
> makes it impossible to review properly, _AND_ no one knows what is going
> to happen in the future.  In the future, you can just add a new ioctl
> and all is good, no need to break working userspace by suddenly looking
> at the arg value and doing something with it.
> 

I think the argument does something today and it's documented to need to 
be 0. If a userspace from the future provides non-zero value, Linux will 
correctly reject it because it doesn't know how to interpret the 
different VM types.

I can document it more clearly as the VM type field and support only the 
one VM type today.

Creating new ioctl for each VM type feels to me like I didn't design 
CREATE_VM ioctl right in first place.

Thanks,
Elliot
Elliot Berman Nov. 4, 2022, 10:38 p.m. UTC | #6
On 11/4/2022 1:10 AM, Arnd Bergmann wrote:
> On Fri, Nov 4, 2022, at 01:11, Elliot Berman wrote:
>> On 11/2/2022 5:24 PM, Greg Kroah-Hartman wrote:
>>> On Wed, Nov 02, 2022 at 11:45:12AM -0700, Elliot Berman wrote:
>>>
>>> Even if you don't support it 1:1, at least for the ones that are the
>>> same thing, pick the same numbers as that's a nicer thing to do, right?
>>>
>>
>> Does same thing == interpretation of arguments is the same? For
>> instance, GH_CREATE_VM and KVM_CREATE_VM interpret the arguments
>> differently. Same for KVM_SET_USERSPACE_MEMORY. The high level
>> functionality should be similar for most all hypervisors since they will
>> all support creating a VM and probably sharing memory with that VM. The
>> arguments for that will necessarily look similar, but they will probably
>> be subtly different because the hypervisors support different features.
> 
> I think in the ideal case, you should make the arguments and the
> command codes the same for any command where that is possible. If
> you come across a command that is shared with KVM but just needs
> another flag, that would involve coordinating with the KVM maintainers
> about sharing the definition so the same flag does not get reused
> in an incompatible way.
> 

I think the converse also needs to be true; KVM would need to check that
new flags don't get used in some incompatible way with Gunyah, even if
one of us is just -EINVAL'ing. I don't think Gunyah and KVM should be
reliant on the other reviewing shared ioctls.

The problem is a bit worse because "machine type" is architecture-
dependent whereas the planned Gunyah flags are architecture-independent.
KVM within itself re-uses flags between architectures so Gunyah would
need to reserve some flags from all architectures that KVM supports.

> For commands that cannot fit into the existing definition, there
> should be a different command code, using your own namespace and
> not the 0xAE block that KVM has. It still makes sense to follow
> the argument structure roughly here, unless there is a technical
> reason for making it different.
> 
>> I don't think userspace that supports both KVM and Gunyah will benefit
>> much from re-using the same numbers since those re-used ioctl calls
>> still need to sit within the context of a Gunyah VM.
> 
> One immediate benefit is for tools that work on running processes,
> such as strace, gdb or qemu-user. If they encounter a known command,
> they can correctly display the arguments etc.
> 

We can update these tools and anyway there will be different ioctls to
get started. There are important ioctls that wouldn't be correctly
displayed off the bat anyway; work would need to be done to support the
Gunyah ioctls either way. Whereas tooling update is temporary, the
coupling of KVM and Gunyah ioctls would be permanent.

> Another benefit is for sharing portions of the VMM that live in
> outside processes like vhost-user based device emulation that
> interacts with irqfd, memfd etc. The more similar the command
> interface is, the easier it gets to keep these tools portable.
> 

Hypervisor interfaces already have different ioctls for equivalent
functionality [1], so VMMs that want to scale to multiple hypervisors
already abstract out ioctl-level interfaces so there wouldn't be any
code-reuse even if Gunyah and KVM shared the same ioctl number. Between
hypervisors, the best case is there is design similarity for userspace,
which makes it easier to add new hypervisor support for VMMs and that's
what we are aiming for.

[1]: e.g. compare KVM, acrn, xen for implementing virtual interrupts.
KVM and acrn use independently implemented irqfd interfaces, xen has
totally different implementation called event channels.

Thanks,
Elliot
Trilok Soni Nov. 5, 2022, 4:19 a.m. UTC | #7
On 11/4/2022 3:38 PM, Elliot Berman wrote:
> 
> 
> On 11/4/2022 1:10 AM, Arnd Bergmann wrote:
>> On Fri, Nov 4, 2022, at 01:11, Elliot Berman wrote:
>>> On 11/2/2022 5:24 PM, Greg Kroah-Hartman wrote:
>>>> On Wed, Nov 02, 2022 at 11:45:12AM -0700, Elliot Berman wrote:
>>>>
>>>> Even if you don't support it 1:1, at least for the ones that are the
>>>> same thing, pick the same numbers as that's a nicer thing to do, right?
>>>>
>>>
>>> Does same thing == interpretation of arguments is the same? For
>>> instance, GH_CREATE_VM and KVM_CREATE_VM interpret the arguments
>>> differently. Same for KVM_SET_USERSPACE_MEMORY. The high level
>>> functionality should be similar for most all hypervisors since they will
>>> all support creating a VM and probably sharing memory with that VM. The
>>> arguments for that will necessarily look similar, but they will probably
>>> be subtly different because the hypervisors support different features.
>>
>> I think in the ideal case, you should make the arguments and the
>> command codes the same for any command where that is possible. If
>> you come across a command that is shared with KVM but just needs
>> another flag, that would involve coordinating with the KVM maintainers
>> about sharing the definition so the same flag does not get reused
>> in an incompatible way.
>>
> 
> I think the converse also needs to be true; KVM would need to check that
> new flags don't get used in some incompatible way with Gunyah, even if
> one of us is just -EINVAL'ing. I don't think Gunyah and KVM should be
> reliant on the other reviewing shared ioctls.
> 
> The problem is a bit worse because "machine type" is architecture-
> dependent whereas the planned Gunyah flags are architecture-independent.
> KVM within itself re-uses flags between architectures so Gunyah would
> need to reserve some flags from all architectures that KVM supports.

I agree w/ Elliot. We would like to keep Gunyah independent and not rely 
on the existing KVM ioctls space. We should allow new hypervisor drivers 
interfaces addition in Linux kernel without them relying on KVM.

> 
>> For commands that cannot fit into the existing definition, there
>> should be a different command code, using your own namespace and
>> not the 0xAE block that KVM has. It still makes sense to follow
>> the argument structure roughly here, unless there is a technical
>> reason for making it different.
>>
>>> I don't think userspace that supports both KVM and Gunyah will benefit
>>> much from re-using the same numbers since those re-used ioctl calls
>>> still need to sit within the context of a Gunyah VM.
>>
>> One immediate benefit is for tools that work on running processes,
>> such as strace, gdb or qemu-user. If they encounter a known command,
>> they can correctly display the arguments etc.
>>
> 
> We can update these tools and anyway there will be different ioctls to
> get started. There are important ioctls that wouldn't be correctly
> displayed off the bat anyway; work would need to be done to support the
> Gunyah ioctls either way. Whereas tooling update is temporary, the
> coupling of KVM and Gunyah ioctls would be permanent.

Agree, tools can be updated and that is the easy part as we grow the s/w 
stack around Gunyah in userspace, like we already do w/ CrosVM (Virtual 
Machine Manager) and QEMU will be next followed by rust-vmm. All of them 
can be done without Gunyah ioctls relying anything on the KVM ioctls. 
Elliot has also explained very well that we don't to go to KVM 
maintainers for any of our additions and we also don't want them to come 
to us, since there is no interoperability testing. It is best that both 
Hypervisors and their Linux interfaces evolve independently.

---Trilok Soni
Elliot Berman Nov. 11, 2022, 5:08 p.m. UTC | #8
On 11/10/2022 10:24 PM, Greg Kroah-Hartman wrote:
> On Thu, Nov 10, 2022 at 04:03:10PM -0800, Elliot Berman wrote:
>>> Agree, tools can be updated and that is the easy part as we grow the s/w
>>> stack around Gunyah in userspace, like we already do w/ CrosVM (Virtual
>>> Machine Manager) and QEMU will be next followed by rust-vmm. All of them
>>> can be done without Gunyah ioctls relying anything on the KVM ioctls.
>>> Elliot has also explained very well that we don't to go to KVM
>>> maintainers for any of our additions and we also don't want them to come
>>> to us, since there is no interoperability testing. It is best that both
>>> Hypervisors and their Linux interfaces evolve independently.
>>
>> Are above explanations reasonable to not re-use KVM ioctl numbers?
> 
> Try getting close at least, where possible please.  As your ioctl
> numbers didn't even start at 0, it's a bit odd...

Ack, will do.

Thanks,
Elliot