Message ID | cover.1547420060.git.crobinso@redhat.com |
---|---|
Headers | show |
Series | RFC: qemu: virtio-{non-}transitional support | expand |
On Sun, 2019-01-13 at 18:12 -0500, Cole Robinson wrote: [...] > The main RFC bits here are: > > * The disk approach. danpb and I briefly discussed on IRC adding > new bus= values vs a new model= attribute. We decided model= > is the lesser of two evils, since bus= handling in apps is > tied with target= generation, so adding new virtio-X bus= > values will cause more work for apps. These patches add > a <disk model=X/> attribute This sounds fairly reasonable, but I reserve the right to change my mind after looking at the code and thinking about it a bit more :) > * The XML and naming. Previous discussions seemed to favor adding > new model-style values rather than a 'transitional' attribute > or similar. So these patches add model='virtio-transitional' > and model='virtio-non-transitional' Yeah, that's what I recall the consensus being. > * The PCI address handling. I just mapped virtio-non-transitional > to imply plain PCI addressing. I think that's all we need but > I'm not positive so I'd appreciate a review of that approach. I don't think that's right. Let me fish up a message I wrote a while ago summing up interactions between VirtIO devices and PCI (Express) slots: http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg03133.html Basically VirtIO 0.9 requires IO space to be available, and 1.0 did away with that requirement because PCI Express, unlike conventional PCI, allows devices *not* to have IO space. So transitional devices, which must work with both 0.9 and 1.0, can depend on IO space being available and as such will only work when plugged into conventional PCI slots, whereas non-transitional devices don't need IO space and can thus be plugged into either conventional PCI and PCI Express slots. Ultimately, then, transitional (rather than non-transitional) devices are the ones that must be forced into conventional PCI slots. -- Andrea Bolognani / Red Hat / Virtualization -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 01/15/2019 08:30 AM, Andrea Bolognani wrote: > On Sun, 2019-01-13 at 18:12 -0500, Cole Robinson wrote: > [...] >> The main RFC bits here are: ... > >> * The PCI address handling. I just mapped virtio-non-transitional >> to imply plain PCI addressing. I think that's all we need but >> I'm not positive so I'd appreciate a review of that approach. > > I don't think that's right. Let me fish up a message I wrote a while > ago summing up interactions between VirtIO devices and PCI (Express) > slots: > > http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg03133.html > > Basically VirtIO 0.9 requires IO space to be available, and 1.0 did > away with that requirement because PCI Express, unlike conventional > PCI, allows devices *not* to have IO space. > > So transitional devices, which must work with both 0.9 and 1.0, can > depend on IO space being available and as such will only work when > plugged into conventional PCI slots, whereas non-transitional > devices don't need IO space and can thus be plugged into either > conventional PCI and PCI Express slots. > > Ultimately, then, transitional (rather than non-transitional) > devices are the ones that must be forced into conventional PCI > slots. > Okay thanks for the correction, so that sounds like for NON_TRANSITIONAL we should also be forcing pcieFlags. I've made those changes and updated the branch here: https://github.com/crobinso/libvirt/tree/virtio Thanks, Cole -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, 2019-01-15 at 10:23 -0500, Cole Robinson wrote: > On 01/15/2019 08:30 AM, Andrea Bolognani wrote: [...] > > Basically VirtIO 0.9 requires IO space to be available, and 1.0 did > > away with that requirement because PCI Express, unlike conventional > > PCI, allows devices *not* to have IO space. > > > > So transitional devices, which must work with both 0.9 and 1.0, can > > depend on IO space being available and as such will only work when > > plugged into conventional PCI slots, whereas non-transitional > > devices don't need IO space and can thus be plugged into either > > conventional PCI and PCI Express slots. > > > > Ultimately, then, transitional (rather than non-transitional) > > devices are the ones that must be forced into conventional PCI > > slots. > > Okay thanks for the correction, so that sounds like for NON_TRANSITIONAL > we should also be forcing pcieFlags. Not really. As explained above, non-transitional VirtIO devices work when plugged into both conventional PCI and PCI Express slots, so virtioFlags sounds more appropriate: by using pcieFlags you'd limit non-transitional devices to q35 guests, while they can work just fine on i440fx too. > I've made those changes and updated > the branch here: https://github.com/crobinso/libvirt/tree/virtio I'm only gonna cherry-pick commit 1/6 from the GitHub branch to avoid any ambiguity when there's a divergence between what you posted on the mailing list and what you pushed on GitHub: more specifically, I'm gonna be reviewing the former. -- Andrea Bolognani / Red Hat / Virtualization -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Jan 15, 2019 at 02:30:14PM +0100, Andrea Bolognani wrote: > On Sun, 2019-01-13 at 18:12 -0500, Cole Robinson wrote: > [...] > > The main RFC bits here are: > > > > * The disk approach. danpb and I briefly discussed on IRC adding > > new bus= values vs a new model= attribute. We decided model= > > is the lesser of two evils, since bus= handling in apps is > > tied with target= generation, so adding new virtio-X bus= > > values will cause more work for apps. These patches add > > a <disk model=X/> attribute > > This sounds fairly reasonable, but I reserve the right to change my > mind after looking at the code and thinking about it a bit more :) > > > * The XML and naming. Previous discussions seemed to favor adding > > new model-style values rather than a 'transitional' attribute > > or similar. So these patches add model='virtio-transitional' > > and model='virtio-non-transitional' > > Yeah, that's what I recall the consensus being. > > > * The PCI address handling. I just mapped virtio-non-transitional > > to imply plain PCI addressing. I think that's all we need but > > I'm not positive so I'd appreciate a review of that approach. > > I don't think that's right. Let me fish up a message I wrote a while > ago summing up interactions between VirtIO devices and PCI (Express) > slots: > > http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg03133.html > > Basically VirtIO 0.9 requires IO space to be available, and 1.0 did > away with that requirement because PCI Express, unlike conventional > PCI, allows devices *not* to have IO space. > > So transitional devices, which must work with both 0.9 and 1.0, can > depend on IO space being available and as such will only work when > plugged into conventional PCI slots, whereas non-transitional > devices don't need IO space and can thus be plugged into either > conventional PCI and PCI Express slots. > > Ultimately, then, transitional (rather than non-transitional) > devices are the ones that must be forced into conventional PCI > slots. Yes, the existing devices fail when placed in a PCI-X slot with certain guest OS. The -transitional devices are functionally identical to the existing devices. They serve as a hint to libvirt that it should never place them in a PCI-X slot. Non-transitional (aka 1.0) devices work correctly in either slot type Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Sun, Jan 13, 2019 at 06:12:02PM -0500, Cole Robinson wrote: > This series adds the beginnings of support for virtio-transitional > and virtio-non-transitional qemu devices. > > qemu patches, queued for qemu 4.0.0: > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00923.html > Previous libvirt discussion around this: > https://www.redhat.com/archives/libvir-list/2018-August/msg01073.html > > Long story short we need to expose these options so apps have a > usable way to support rhel6 + virtio + q35. > > This series only covers exposing the associated device models > for disk and rng devices to make sure I'm on the right track. > serial, net, scsi, input-host, balloon 9p, vsock, vhost-scsi > still need to be implemented. Those should follow the rng > example, except vhost-scsi may need to be a different approach. virDomainNetDef is a mess because it *still* doesn't use a enum for the device model :-( We really must fix that rather than blindly allowing arbitrary passthrough of hypervisor specific names. I recall Laine had patches for this some 4/5 years ago, but can't remember why we never merged them. > The main RFC bits here are: > > * The disk approach. danpb and I briefly discussed on IRC adding > new bus= values vs a new model= attribute. We decided model= > is the lesser of two evils, since bus= handling in apps is > tied with target= generation, so adding new virtio-X bus= > values will cause more work for apps. These patches add > a <disk model=X/> attribute Yes, using 'bus' will have a very painful ripple effect on mgmt apps we should avoid at all costs. > * The XML and naming. Previous discussions seemed to favor adding > new model-style values rather than a 'transitional' attribute > or similar. So these patches add model='virtio-transitional' > and model='virtio-non-transitional' Yep. > * The PCI address handling. I just mapped virtio-non-transitional > to imply plain PCI addressing. I think that's all we need but > I'm not positive so I'd appreciate a review of that approach. This was inverted, as Andrea already clarified Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, 2019-01-15 at 17:02 +0000, Daniel P. Berrangé wrote: > On Tue, Jan 15, 2019 at 02:30:14PM +0100, Andrea Bolognani wrote: > > Basically VirtIO 0.9 requires IO space to be available, and 1.0 did > > away with that requirement because PCI Express, unlike conventional > > PCI, allows devices *not* to have IO space. > > > > So transitional devices, which must work with both 0.9 and 1.0, can > > depend on IO space being available and as such will only work when > > plugged into conventional PCI slots, whereas non-transitional > > devices don't need IO space and can thus be plugged into either > > conventional PCI and PCI Express slots. > > > > Ultimately, then, transitional (rather than non-transitional) > > devices are the ones that must be forced into conventional PCI > > slots. > > Yes, the existing devices fail when placed in a PCI-X slot with certain > guest OS. The -transitional devices are functionally identical to the > existing devices. They serve as a hint to libvirt that it should never > place them in a PCI-X slot. Not quite: existing devices (virtio-*-pci) will change their behavior based on the slot they're plugged into, so they will show up as non-transitional when connected to a PCI Express slot and as transitional otherwise. If that wasn't the case, there wouldn't have been a way to use VirtIO devices on x86/q35 or aarch64/virt without throwing pcie-to-pci-bridge into the mix until now, and we would also need to change the behavior for existing devices, neither of which is true :) Whether or not the guest OS supports VirtIO 1.0 and can thus drive a non-transitional device is a separate matter, which adding support for these new devices to libvirt and libosinfo will address.
On Tue, 2019-01-15 at 17:06 +0000, Daniel P. Berrangé wrote: > virDomainNetDef is a mess because it *still* doesn't use a enum > for the device model :-( We really must fix that rather than > blindly allowing arbitrary passthrough of hypervisor specific > names. > > I recall Laine had patches for this some 4/5 years ago, but > can't remember why we never merged them. Based on what I can recall from my own, more recent, attempt at fixing the mess, the main blocker was that in order to keep accepting all existing configurations you'd basically have to still store the model as a string and only at a later time convert it to an enum. So you'd end up making the code more complicated rather than simpler which, needless to say, makes the idea way less attractive :(
On Wed, Jan 16, 2019 at 12:01:00PM +0100, Andrea Bolognani wrote: > On Tue, 2019-01-15 at 17:06 +0000, Daniel P. Berrangé wrote: > > virDomainNetDef is a mess because it *still* doesn't use a enum > > for the device model :-( We really must fix that rather than > > blindly allowing arbitrary passthrough of hypervisor specific > > names. > > > > I recall Laine had patches for this some 4/5 years ago, but > > can't remember why we never merged them. > > Based on what I can recall from my own, more recent, attempt at > fixing the mess, the main blocker was that in order to keep > accepting all existing configurations you'd basically have to > still store the model as a string and only at a later time > convert it to an enum. The enum should cover all existing reasonably expected configs. Sure I imagine we might miss a few, especially for obscure architectures or hypervisors, but that would just be bugs to be fixed > So you'd end up making the code more complicated rather than > simpler which, needless to say, makes the idea way less > attractive :( The key point of using an enum is to ensure that our esx/qemu/parallels drivers all guaranteed to use the same model names, which a core benefit that libvirt is supposed to be adding for applications. Regards, Daniel
On Wed, Jan 16, 2019 at 10:41:59AM +0100, Andrea Bolognani wrote: > On Tue, 2019-01-15 at 17:02 +0000, Daniel P. Berrangé wrote: > > On Tue, Jan 15, 2019 at 02:30:14PM +0100, Andrea Bolognani wrote: > > > Basically VirtIO 0.9 requires IO space to be available, and 1.0 did > > > away with that requirement because PCI Express, unlike conventional > > > PCI, allows devices *not* to have IO space. > > > > > > So transitional devices, which must work with both 0.9 and 1.0, can > > > depend on IO space being available and as such will only work when > > > plugged into conventional PCI slots, whereas non-transitional > > > devices don't need IO space and can thus be plugged into either > > > conventional PCI and PCI Express slots. > > > > > > Ultimately, then, transitional (rather than non-transitional) > > > devices are the ones that must be forced into conventional PCI > > > slots. > > > > Yes, the existing devices fail when placed in a PCI-X slot with certain > > guest OS. The -transitional devices are functionally identical to the > > existing devices. They serve as a hint to libvirt that it should never > > place them in a PCI-X slot. > > Not quite: existing devices (virtio-*-pci) will change their > behavior based on the slot they're plugged into, so they will show > up as non-transitional when connected to a PCI Express slot and as > transitional otherwise. We're only ever going to plug -transitional devices into PCI slots, not PCI-X slots. So the magic behaviour of the existing devices won't come into effect in libvirt's usage, when the XML has request a transitional device. Regards, Daniel
On Wed, 2019-01-16 at 11:13 +0000, Daniel P. Berrangé wrote: > On Wed, Jan 16, 2019 at 12:01:00PM +0100, Andrea Bolognani wrote: > > Based on what I can recall from my own, more recent, attempt at > > fixing the mess, the main blocker was that in order to keep > > accepting all existing configurations you'd basically have to > > still store the model as a string and only at a later time > > convert it to an enum. > > The enum should cover all existing reasonably expected configs. Sure I > imagine we might miss a few, especially for obscure architectures or > hypervisors, but that would just be bugs to be fixed Until we've fixed said bugs, guests using such models would just disappear though, wouldn't they? That doesn't sound acceptable. And defining even the incomplete enum would require someone to be familiar with all drivers in order to know which models are commonly used, or at all available.
On Wed, Jan 16, 2019 at 03:07:07PM +0100, Andrea Bolognani wrote: > On Wed, 2019-01-16 at 11:13 +0000, Daniel P. Berrangé wrote: > > On Wed, Jan 16, 2019 at 12:01:00PM +0100, Andrea Bolognani wrote: > > > Based on what I can recall from my own, more recent, attempt at > > > fixing the mess, the main blocker was that in order to keep > > > accepting all existing configurations you'd basically have to > > > still store the model as a string and only at a later time > > > convert it to an enum. > > > > The enum should cover all existing reasonably expected configs. Sure I > > imagine we might miss a few, especially for obscure architectures or > > hypervisors, but that would just be bugs to be fixed > > Until we've fixed said bugs, guests using such models would just > disappear though, wouldn't they? That doesn't sound acceptable. We could do a multi-step conversion. First define an enum and report all known enum values in the domain capabilities. Taint any guest using a nic with doesn't match. A year or so later, then finally enforce the enum usage. > And defining even the incomplete enum would require someone to > be familiar with all drivers in order to know which models are > commonly used, or at all available. It isn't as bad as it seems. VMWare has whitelisted names, Hyperv doesn't report models at all, Xen is a small finite set. QEMU is the only hard one given the huge set of system emulators. Regards, Daniel
On Wed, 2019-01-16 at 14:19 +0000, Daniel P. Berrangé wrote: > On Wed, Jan 16, 2019 at 03:07:07PM +0100, Andrea Bolognani wrote: > > On Wed, 2019-01-16 at 11:13 +0000, Daniel P. Berrangé wrote: > > > The enum should cover all existing reasonably expected configs. Sure I > > > imagine we might miss a few, especially for obscure architectures or > > > hypervisors, but that would just be bugs to be fixed > > > > Until we've fixed said bugs, guests using such models would just > > disappear though, wouldn't they? That doesn't sound acceptable. > > We could do a multi-step conversion. First define an enum and report > all known enum values in the domain capabilities. Taint any guest > using a nic with doesn't match. A year or so later, then finally > enforce the enum usage. That will still result in guests disappearing at some point, which is not something that we're generally okay with. Why would it be any different this time around? > > And defining even the incomplete enum would require someone to > > be familiar with all drivers in order to know which models are > > commonly used, or at all available. > > It isn't as bad as it seems. VMWare has whitelisted names, Hyperv > doesn't report models at all, Xen is a small finite set. QEMU is > the only hard one given the huge set of system emulators. If we're willing to leave some theoretical users of impractical network devices in the dust, then coming up with a reasonably small list for QEMU is not too difficult either... But see above :)
On Wed, Jan 16, 2019 at 03:46:12PM +0100, Andrea Bolognani wrote: > On Wed, 2019-01-16 at 14:19 +0000, Daniel P. Berrangé wrote: > > On Wed, Jan 16, 2019 at 03:07:07PM +0100, Andrea Bolognani wrote: > > > On Wed, 2019-01-16 at 11:13 +0000, Daniel P. Berrangé wrote: > > > > The enum should cover all existing reasonably expected configs. Sure I > > > > imagine we might miss a few, especially for obscure architectures or > > > > hypervisors, but that would just be bugs to be fixed > > > > > > Until we've fixed said bugs, guests using such models would just > > > disappear though, wouldn't they? That doesn't sound acceptable. > > > > We could do a multi-step conversion. First define an enum and report > > all known enum values in the domain capabilities. Taint any guest > > using a nic with doesn't match. A year or so later, then finally > > enforce the enum usage. > > That will still result in guests disappearing at some point, which > is not something that we're generally okay with. Why would it be > any different this time around? No change we make is perfectly risk free. We would want to have reasonable confidence that the initial enum is a good as we can practically make it. The tainting check & log is a way to identify if we made some mistakes - hopefully we won't have. If users do report it though, we'll be able to fix it. If we get no reports for a reasonable period of time (minimum 12 months), it is OK to assume we've not broken anything that we believe has users. > > > And defining even the incomplete enum would require someone to > > > be familiar with all drivers in order to know which models are > > > commonly used, or at all available. > > > > It isn't as bad as it seems. VMWare has whitelisted names, Hyperv > > doesn't report models at all, Xen is a small finite set. QEMU is > > the only hard one given the huge set of system emulators. > > If we're willing to leave some theoretical users of impractical > network devices in the dust, then coming up with a reasonably small > list for QEMU is not too difficult either... But see above :) The QEMU list won't be small - it'll be large given all archs ! Regards, Daniel
On Wed, Jan 16, 2019 at 15:36:29 +0000, Daniel Berrange wrote: > On Wed, Jan 16, 2019 at 03:46:12PM +0100, Andrea Bolognani wrote: > > On Wed, 2019-01-16 at 14:19 +0000, Daniel P. Berrangé wrote: > > > On Wed, Jan 16, 2019 at 03:07:07PM +0100, Andrea Bolognani wrote: > > > > On Wed, 2019-01-16 at 11:13 +0000, Daniel P. Berrangé wrote: > > > > > The enum should cover all existing reasonably expected configs. Sure I > > > > > imagine we might miss a few, especially for obscure architectures or > > > > > hypervisors, but that would just be bugs to be fixed > > > > > > > > Until we've fixed said bugs, guests using such models would just > > > > disappear though, wouldn't they? That doesn't sound acceptable. > > > > > > We could do a multi-step conversion. First define an enum and report > > > all known enum values in the domain capabilities. Taint any guest > > > using a nic with doesn't match. A year or so later, then finally > > > enforce the enum usage. > > > > That will still result in guests disappearing at some point, which > > is not something that we're generally okay with. Why would it be > > any different this time around? > > No change we make is perfectly risk free. We would want to have > reasonable confidence that the initial enum is a good as we can > practically make it. The tainting check & log is a way to identify > if we made some mistakes - hopefully we won't have. If users do > report it though, we'll be able to fix it. If we get no reports > for a reasonable period of time (minimum 12 months), it is OK to > assume we've not broken anything that we believe has users. You can introduce yet another intermediate stage where the Validation machinery will reject to start and define VMs with invalid names. That way you get a way stronger level of user warning than taints are with the benefit of not losing guests at that point. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list