mbox series

[v11,00/19] Initial support for multi-process Qemu

Message ID cover.1602784930.git.jag.raman@oracle.com
Headers show
Series Initial support for multi-process Qemu | expand

Message

Jag Raman Oct. 15, 2020, 6:04 p.m. UTC
Hello,

This is the v11 of the patchset. Thank you very much for the
review of the v10 of the series. We are glad to hear the
patchset is getting in a better shape.

We made changes to the following patches in this series.

[PATCH v10 06/19] multi-process: define MPQemuMsg format and transmission functions
[PATCH v10 08/19] multi-process: Associate fd of a PCIDevice with its object
[PATCH v10 10/19] multi-process: introduce proxy object

To touch upon the history of this project, we posted the
Proof Of Concept patches before the BoF session in 2018.
Subsequently, we posted 9 versions on the qemu-devel mailist.
You can find them by following the links below ([1] - [9]).

Following people contributed to the design and
implementation of this project:
Jagannathan Raman <jag.raman@oracle.com>
Elena Ufimtseva <elena.ufimtseva@oracle.com>
John G Johnson <john.g.johnson@oracle.com>
Stefan Hajnoczi <stefanha@redhat.com>
Konrad Wilk <konrad.wilk@oracle.com>
Kanth Ghatraju <kanth.ghatraju@oracle.com>

We would like to thank QEMU community for your feedback in the
design and implementation of this project.

Qemu wiki page:
https://wiki.qemu.org/Features/MultiProcessQEMU

For the full concept writeup about QEMU multi-process, please refer to
docs/devel/qemu-multiprocess.rst. Also see docs/qemu-multiprocess.txt for
usage information.


We welcome all your ideas, concerns, and questions for this patchset.
[POC]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg566538.html
[1]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg602285.html
[2]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg624877.html
[3]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg642000.html
[4]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg655118.html
[5]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg682429.html
[6]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg697484.html
[7]: https://patchew.org/QEMU/cover.1593273671.git.elena.ufimtseva@oracle.com/
[8]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg727007.html
[9]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg734275.html
[10]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg747638.html

Elena Ufimtseva (7):
  multi-process: add qio channel function to transmit
  multi-process: define MPQemuMsg format and transmission functions
  multi-process: introduce proxy object
  multi-process: add proxy communication functions
  multi-process: Forward PCI config space acceses to the remote process
  multi-process: perform device reset in the remote process
  multi-process: add configure and usage information

Jagannathan Raman (11):
  memory: alloc RAM from file at offset
  multi-process: Add config option for multi-process QEMU
  multi-process: setup PCI host bridge for remote device
  multi-process: setup a machine object for remote device process
  multi-process: Initialize message handler in remote device
  multi-process: Associate fd of a PCIDevice with its object
  multi-process: setup memory manager for remote device
  multi-process: PCI BAR read/write handling for proxy & remote
    endpoints
  multi-process: Synchronize remote memory
  multi-process: create IOHUB object to handle irq
  multi-process: Retrieve PCI info from remote process

John G Johnson (1):
  multi-process: add the concept description to
    docs/devel/qemu-multiprocess

 MAINTAINERS                     |  26 ++
 backends/hostmem-memfd.c        |   2 +-
 configure                       |  10 +
 docs/devel/index.rst            |   1 +
 docs/devel/multi-process.rst    | 966 ++++++++++++++++++++++++++++++++++++++++
 docs/multi-process.rst          |  67 +++
 hw/i386/meson.build             |   5 +
 hw/i386/remote-iohub.c          | 123 +++++
 hw/i386/remote-memory.c         |  58 +++
 hw/i386/remote-msg.c            | 241 ++++++++++
 hw/i386/remote-obj.c            | 154 +++++++
 hw/i386/remote.c                |  79 ++++
 hw/misc/ivshmem.c               |   3 +-
 hw/pci-host/meson.build         |   1 +
 hw/pci-host/remote.c            |  75 ++++
 hw/pci/memory-sync.c            | 210 +++++++++
 hw/pci/meson.build              |   3 +
 hw/pci/proxy.c                  | 377 ++++++++++++++++
 include/exec/memory.h           |   2 +
 include/exec/ram_addr.h         |   2 +-
 include/hw/i386/remote-iohub.h  |  42 ++
 include/hw/i386/remote-memory.h |  19 +
 include/hw/i386/remote-obj.h    |  42 ++
 include/hw/i386/remote.h        |  40 ++
 include/hw/pci-host/remote.h    |  31 ++
 include/hw/pci/memory-sync.h    |  27 ++
 include/hw/pci/pci_ids.h        |   3 +
 include/hw/pci/proxy.h          |  53 +++
 include/io/channel.h            |  24 +
 include/io/mpqemu-link.h        |  98 ++++
 include/qemu/mmap-alloc.h       |   3 +-
 io/channel.c                    |  45 ++
 io/meson.build                  |   2 +
 io/mpqemu-link.c                | 303 +++++++++++++
 meson.build                     |   1 +
 scripts/mpqemu-launcher.py      |  49 ++
 softmmu/memory.c                |   3 +-
 softmmu/physmem.c               |  11 +-
 util/mmap-alloc.c               |   7 +-
 util/oslib-posix.c              |   2 +-
 40 files changed, 3197 insertions(+), 13 deletions(-)
 create mode 100644 docs/devel/multi-process.rst
 create mode 100644 docs/multi-process.rst
 create mode 100644 hw/i386/remote-iohub.c
 create mode 100644 hw/i386/remote-memory.c
 create mode 100644 hw/i386/remote-msg.c
 create mode 100644 hw/i386/remote-obj.c
 create mode 100644 hw/i386/remote.c
 create mode 100644 hw/pci-host/remote.c
 create mode 100644 hw/pci/memory-sync.c
 create mode 100644 hw/pci/proxy.c
 create mode 100644 include/hw/i386/remote-iohub.h
 create mode 100644 include/hw/i386/remote-memory.h
 create mode 100644 include/hw/i386/remote-obj.h
 create mode 100644 include/hw/i386/remote.h
 create mode 100644 include/hw/pci-host/remote.h
 create mode 100644 include/hw/pci/memory-sync.h
 create mode 100644 include/hw/pci/proxy.h
 create mode 100644 include/io/mpqemu-link.h
 create mode 100644 io/mpqemu-link.c
 create mode 100755 scripts/mpqemu-launcher.py

Comments

Stefan Hajnoczi Oct. 23, 2020, 1:57 p.m. UTC | #1
On Thu, Oct 15, 2020 at 02:04:53PM -0400, Jagannathan Raman wrote:
> Hello,
> 
> This is the v11 of the patchset. Thank you very much for the
> review of the v10 of the series. We are glad to hear the
> patchset is getting in a better shape.
> 
> We made changes to the following patches in this series.
> 
> [PATCH v10 06/19] multi-process: define MPQemuMsg format and transmission functions
> [PATCH v10 08/19] multi-process: Associate fd of a PCIDevice with its object
> [PATCH v10 10/19] multi-process: introduce proxy object

I have reviewed these patches and posted a comment about a possible
buffer overflow.

Stefan
Philippe Mathieu-Daudé Oct. 23, 2020, 4:36 p.m. UTC | #2
On 10/15/20 8:04 PM, Jagannathan Raman wrote:
> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> 
> The entire array of the memory regions and file handlers.

I don't understand the description... Did you forgot the
verb?

> Will be used in the next patch.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   include/io/channel.h | 24 ++++++++++++++++++++++++
>   io/channel.c         | 45 +++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 69 insertions(+)
Philippe Mathieu-Daudé Oct. 23, 2020, 4:59 p.m. UTC | #3
On 10/15/20 8:05 PM, Jagannathan Raman wrote:
> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> 
> The Proxy Object sends the PCI config space accesses as messages
> to the remote process over the communication channel
> 
> TODO:
> Investigate if the local PCI config writes can be dropped.
> Without the proxy local PCI config space writes for the device,
> the driver in the guest times out on the probing.
> We have tried to only refer to the remote for the PCI config writes,
> but the driver timeout in the guest forced as to left this
> as it is (removing local PCI config only).

I expect this TODO to be lost in git history. Better to
commit it in the code and remove it once resolved IMHO.

> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   hw/i386/remote-msg.c     | 62 ++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/pci/proxy.c           | 50 ++++++++++++++++++++++++++++++++++++++
>   include/io/mpqemu-link.h |  9 +++++++
>   io/mpqemu-link.c         |  6 +++++
>   4 files changed, 127 insertions(+)

Please have a look at scripts/git.orderfile, it helps
reviewers to avoid scrolling down/up the patches.

> 
> diff --git a/hw/i386/remote-msg.c b/hw/i386/remote-msg.c
> index 6451b77..94937db 100644
> --- a/hw/i386/remote-msg.c
> +++ b/hw/i386/remote-msg.c
> @@ -15,6 +15,12 @@
>   #include "io/mpqemu-link.h"
>   #include "qapi/error.h"
>   #include "sysemu/runstate.h"
> +#include "hw/pci/pci.h"
> +
> +static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
> +                                 MPQemuMsg *msg);
> +static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
> +                                MPQemuMsg *msg);
>   
>   void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
>   {
> @@ -43,6 +49,12 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
>           }
>   
>           switch (msg.cmd) {
> +        case PCI_CONFIG_WRITE:
> +            process_config_write(com->ioc, pci_dev, &msg);
> +            break;
> +        case PCI_CONFIG_READ:
> +            process_config_read(com->ioc, pci_dev, &msg);
> +            break;
>           default:
>               error_setg(&local_err,
>                          "Unknown command (%d) received for device %s (pid=%d)",
> @@ -60,3 +72,53 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
>   
>       return;
>   }
> +
> +static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
> +                                 MPQemuMsg *msg)
> +{
> +    ConfDataMsg *conf = (ConfDataMsg *)&msg->data.conf_data;
> +    MPQemuMsg ret = { 0 };
> +    Error *local_err = NULL;
> +
> +    if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {
> +        error_report("Bad address received when writing PCI config, pid %d",
> +                     getpid());
> +        ret.data.u64 = UINT64_MAX;
> +    } else {
> +        pci_default_write_config(dev, conf->addr, conf->val, conf->l);
> +    }
> +
> +    ret.cmd = RET_MSG;
> +    ret.size = sizeof(ret.data.u64);
> +
> +    mpqemu_msg_send(&ret, ioc, &local_err);
> +    if (local_err) {
> +        error_report("Could not send message to proxy from pid %d",
> +                     getpid());
> +    }
> +}
> +
> +static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
> +                                MPQemuMsg *msg)
> +{
> +    ConfDataMsg *conf = (ConfDataMsg *)&msg->data.conf_data;
> +    MPQemuMsg ret = { 0 };
> +    Error *local_err = NULL;
> +
> +    if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {
> +        error_report("Bad address received when reading PCI config, pid %d",
> +                     getpid());
> +        ret.data.u64 = UINT64_MAX;
> +    } else {
> +        ret.data.u64 = pci_default_read_config(dev, conf->addr, conf->l);

Isn't it endianess issue here? It might makes sense to
declare ConfDataMsg and PCI_CONFIG_READ/PCI_CONFIG_WRITE
packets in little endian.

> +    }
> +
> +    ret.cmd = RET_MSG;
> +    ret.size = sizeof(ret.data.u64);
> +
> +    mpqemu_msg_send(&ret, ioc, &local_err);
> +    if (local_err) {
> +        error_report("Could not send message to proxy from pid %d",
> +                     getpid());
> +    }
> +}
> diff --git a/hw/pci/proxy.c b/hw/pci/proxy.c
> index 7a12f23..27714fe 100644
> --- a/hw/pci/proxy.c
> +++ b/hw/pci/proxy.c
> @@ -16,6 +16,8 @@
>   #include "hw/qdev-properties.h"
>   #include "monitor/monitor.h"
>   #include "migration/blocker.h"
> +#include "io/mpqemu-link.h"
> +#include "qemu/error-report.h"
>   
>   static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
>   {
> @@ -69,6 +71,51 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
>       error_free(dev->migration_blocker);
>   }
>   
> +static int config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
> +                          int l, unsigned int op)
> +{
> +    MPQemuMsg msg = { 0 };
> +    uint64_t ret = -EINVAL;
> +    Error *local_err = NULL;
> +
> +    msg.cmd = op;
> +    msg.data.conf_data.addr = addr;
> +    msg.data.conf_data.val = (op == PCI_CONFIG_WRITE) ? *val : 0;
> +    msg.data.conf_data.l = l;
> +    msg.size = sizeof(ConfDataMsg);
> +
> +    ret = mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +    }
> +    if (op == PCI_CONFIG_READ) {
> +        *val = (uint32_t)ret;
> +    }
> +
> +    return ret;
> +}
> +
> +static uint32_t pci_proxy_read_config(PCIDevice *d, uint32_t addr, int len)
> +{
> +    uint32_t val;
> +
> +    (void)config_op_send(PCI_PROXY_DEV(d), addr, &val, len, PCI_CONFIG_READ);
> +
> +    return val;
> +}
> +
> +static void pci_proxy_write_config(PCIDevice *d, uint32_t addr, uint32_t val,
> +                                   int l)
> +{
> +    /*
> +     * Some of the functions access the copy of the remote device
> +     * PCI config space, therefore maintain it updated.
> +     */
> +    pci_default_write_config(d, addr, val, l);
> +
> +    (void)config_op_send(PCI_PROXY_DEV(d), addr, &val, l, PCI_CONFIG_WRITE);
> +}
> +
>   static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
>   {
>       DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -76,6 +123,9 @@ static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
>   
>       k->realize = pci_proxy_dev_realize;
>       k->exit = pci_proxy_dev_exit;
> +    k->config_read = pci_proxy_read_config;
> +    k->config_write = pci_proxy_write_config;
> +
>       device_class_set_props(dc, proxy_properties);
>   }
>   
> diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
> index 4bea5da..459d345 100644
> --- a/include/io/mpqemu-link.h
> +++ b/include/io/mpqemu-link.h
> @@ -34,6 +34,8 @@ typedef enum {
>       MPQEMU_CMD_INIT,
>       SYNC_SYSMEM,
>       RET_MSG,
> +    PCI_CONFIG_WRITE,
> +    PCI_CONFIG_READ,
>       MPQEMU_CMD_MAX,
>   } MPQemuCmd;
>   
> @@ -43,6 +45,12 @@ typedef struct {
>       off_t offsets[REMOTE_MAX_FDS];
>   } SyncSysmemMsg;
>   
> +typedef struct {
> +    uint32_t addr;
> +    uint32_t val;
> +    int l;
> +} ConfDataMsg;

Maybe name this PciConfDataMsg?

> +
>   /**
>    * MPQemuMsg:
>    * @cmd: The remote command
> @@ -60,6 +68,7 @@ typedef struct {
>   
>       union {
>           uint64_t u64;
> +        ConfDataMsg conf_data;
>           SyncSysmemMsg sync_sysmem;
>       } data;
>   
> diff --git a/io/mpqemu-link.c b/io/mpqemu-link.c
> index 1339cc7..5ef82da 100644
> --- a/io/mpqemu-link.c
> +++ b/io/mpqemu-link.c
> @@ -278,6 +278,12 @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
>               return false;
>           }
>           break;
> +    case PCI_CONFIG_WRITE:
> +    case PCI_CONFIG_READ:
> +        if (msg->size != sizeof(ConfDataMsg)) {
> +            return false;
> +        }
> +        break;
>       default:
>           break;
>       }
>
Philippe Mathieu-Daudé Oct. 23, 2020, 5:20 p.m. UTC | #4
On 10/23/20 6:59 PM, Philippe Mathieu-Daudé wrote:
> On 10/15/20 8:05 PM, Jagannathan Raman wrote:

>> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

>>

>> The Proxy Object sends the PCI config space accesses as messages

>> to the remote process over the communication channel

...

>> +static void process_config_read(QIOChannel *ioc, PCIDevice *dev,

>> +                                MPQemuMsg *msg)

>> +{

>> +    ConfDataMsg *conf = (ConfDataMsg *)&msg->data.conf_data;

>> +    MPQemuMsg ret = { 0 };

>> +    Error *local_err = NULL;

>> +

>> +    if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {

>> +        error_report("Bad address received when reading PCI config, 

>> pid %d",

>> +                     getpid());

>> +        ret.data.u64 = UINT64_MAX;

>> +    } else {

>> +        ret.data.u64 = pci_default_read_config(dev, conf->addr, 

>> conf->l);

> 

> Isn't it endianess issue here? It might makes sense to

> declare ConfDataMsg and PCI_CONFIG_READ/PCI_CONFIG_WRITE

> packets in little endian.


Bah Stefan told me to look at the specs which are included in
the latest patch (docs/multi-process.rst). As I process patches
of this series in order I haven't looked at it yet. Still,
why not put it first in your series?

Regards,

Phil.