Message ID | 20220606061927.26049-4-nicolinc@nvidia.com |
---|---|
State | New |
Headers | show |
Series | Simplify vfio_iommu_type1 attach/detach routine | expand |
> From: Nicolin Chen > Sent: Monday, June 6, 2022 2:19 PM > > From: Jason Gunthorpe <jgg@nvidia.com> > > The KVM mechanism for controlling wbinvd is only triggered during > kvm_vfio_group_add(), meaning it is a one-shot test done once the devices > are setup. It's not one-shot. kvm_vfio_update_coherency() is called in both group_add() and group_del(). Then the coherency property is checked dynamically in wbinvd emulation: kvm_emulate_wbinvd() kvm_emulate_wbinvd_noskip() need_emulate_wbinvd() kvm_arch_has_noncoherent_dma() It's also checked when a vcpu is scheduled to a new cpu for tracking dirty cpus which requires cache flush when emulating wbinvd on that vcpu. See kvm_arch_vcpu_load(). /* Address WBINVD may be executed by guest */ if (need_emulate_wbinvd(vcpu)) { if (static_call(kvm_x86_has_wbinvd_exit)()) cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask); In addition, it's also checked when deciding the effective memory type of EPT entry. See vmx_get_mt_mask(). if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; But I doubt above can work reliably when the property is changed in the fly given above paths are triggered at different points. The guest may end up in a mixed state where inconsistent coherency is assumed in different emulation paths. and In reality I don't think such niche scenario is even tested given the only device imposing such trick is integrated Intel GPU which iiuc no one would try to hotplug/hot-remove it to/from a guest. given that I'm fine with the change in this patch. Even more probably we really want an explicit one-shot model so KVM can lock down the property once it starts to consume it then further adding a new group which would change the coherency is explicitly rejected and removing an existing group leaves it intact. > > So, there is no value in trying to push a device that could do enforced > cache coherency to a dedicated domain vs re-using an existing domain since "an existing domain (even if it doesn't enforce coherency)", otherwise if it's already compatible there is no question here. > KVM won't be able to take advantage of it. This just wastes domain memory. > > Simplify this code and eliminate the test. This removes the only logic > that needed to have a dummy domain attached prior to searching for a > matching domain and simplifies the next patches. > > If someday we want to try and optimize this further the better approach is > to update the Intel driver so that enforce_cache_coherency() can work on a > domain that already has IOPTEs and then call the enforce_cache_coherency() > after detaching a device from a domain to upgrade the whole domain to > enforced cache coherency mode. > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > --- > drivers/vfio/vfio_iommu_type1.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c > b/drivers/vfio/vfio_iommu_type1.c > index c13b9290e357..f4e3b423a453 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -2285,9 +2285,7 @@ static int vfio_iommu_type1_attach_group(void > *iommu_data, > * testing if they're on the same bus_type. > */ > list_for_each_entry(d, &iommu->domain_list, next) { > - if (d->domain->ops == domain->domain->ops && > - d->enforce_cache_coherency == > - domain->enforce_cache_coherency) { > + if (d->domain->ops == domain->domain->ops) { > iommu_detach_group(domain->domain, group- > >iommu_group); > if (!iommu_attach_group(d->domain, > group->iommu_group)) { > -- > 2.17.1 > > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu
On Wed, Jun 08, 2022 at 08:28:03AM +0000, Tian, Kevin wrote: > > From: Nicolin Chen > > Sent: Monday, June 6, 2022 2:19 PM > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > The KVM mechanism for controlling wbinvd is only triggered during > > kvm_vfio_group_add(), meaning it is a one-shot test done once the devices > > are setup. > > It's not one-shot. kvm_vfio_update_coherency() is called in both > group_add() and group_del(). Then the coherency property is > checked dynamically in wbinvd emulation:
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, June 8, 2022 7:17 PM > > On Wed, Jun 08, 2022 at 08:28:03AM +0000, Tian, Kevin wrote: > > > From: Nicolin Chen > > > Sent: Monday, June 6, 2022 2:19 PM > > > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > > > The KVM mechanism for controlling wbinvd is only triggered during > > > kvm_vfio_group_add(), meaning it is a one-shot test done once the > devices > > > are setup. > > > > It's not one-shot. kvm_vfio_update_coherency() is called in both > > group_add() and group_del(). Then the coherency property is > > checked dynamically in wbinvd emulation: > > From the perspective of managing the domains that is still > one-shot. It doesn't get updated when individual devices are > added/removed to domains. It's unchanged per-domain but dynamic per-vm when multiple domains are added/removed (i.e. kvm->arch.noncoherent_dma_count). It's the latter being checked in the kvm. > > > given that I'm fine with the change in this patch. Even more probably > > we really want an explicit one-shot model so KVM can lock down > > the property once it starts to consume it then further adding a new > > group which would change the coherency is explicitly rejected and > > removing an existing group leaves it intact. > > Why? Once wbinvd is enabled it is compatible with all domain > configurations, so just leave it on and ignore everything at that > point. > More than that. My point was to make it a static policy so even if wbinvd is disabled in the start we want to leave it off and not affected by adding a device which doesn't have coherency. 'wbinvd off' is not a compatible configuration hence imo need a way to reject adding incompatible device. Thanks Kevin
Hi Kevin, On Wed, Jun 08, 2022 at 11:48:27PM +0000, Tian, Kevin wrote: > > > > The KVM mechanism for controlling wbinvd is only triggered during > > > > kvm_vfio_group_add(), meaning it is a one-shot test done once the > > devices > > > > are setup. > > > > > > It's not one-shot. kvm_vfio_update_coherency() is called in both > > > group_add() and group_del(). Then the coherency property is > > > checked dynamically in wbinvd emulation: > > > > From the perspective of managing the domains that is still > > one-shot. It doesn't get updated when individual devices are > > added/removed to domains. > > It's unchanged per-domain but dynamic per-vm when multiple > domains are added/removed (i.e. kvm->arch.noncoherent_dma_count). > It's the latter being checked in the kvm. I am going to send a v2, yet not quite getting the point here. Meanwhile, Jason is on leave. What, in your opinion, would be an accurate description here? Thanks Nic
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Wednesday, June 15, 2022 4:45 AM > > Hi Kevin, > > On Wed, Jun 08, 2022 at 11:48:27PM +0000, Tian, Kevin wrote: > > > > > The KVM mechanism for controlling wbinvd is only triggered during > > > > > kvm_vfio_group_add(), meaning it is a one-shot test done once the > > > devices > > > > > are setup. > > > > > > > > It's not one-shot. kvm_vfio_update_coherency() is called in both > > > > group_add() and group_del(). Then the coherency property is > > > > checked dynamically in wbinvd emulation: > > > > > > From the perspective of managing the domains that is still > > > one-shot. It doesn't get updated when individual devices are > > > added/removed to domains. > > > > It's unchanged per-domain but dynamic per-vm when multiple > > domains are added/removed (i.e. kvm->arch.noncoherent_dma_count). > > It's the latter being checked in the kvm. > > I am going to send a v2, yet not quite getting the point here. > Meanwhile, Jason is on leave. > > What, in your opinion, would be an accurate description here? > Something like below: -- The KVM mechanism for controlling wbinvd is based on OR of the coherency property of all devices attached to a guest, no matter those devices are attached to a single domain or multiple domains. So, there is no value in trying to push a device that could do enforced cache coherency to a dedicated domain vs re-using an existing domain which is non-coherent since KVM won't be able to take advantage of it. This just wastes domain memory. Simplify this code and eliminate the test. This removes the only logic that needed to have a dummy domain attached prior to searching for a matching domain and simplifies the next patches. It's unclear whether we want to further optimize the Intel driver to update the domain coherency after a device is detached from it, at least not before KVM can be verified to handle such dynamics in related emulation paths (wbinvd, vcpu load, write_cr0, ept, etc.). In reality we don't see an usage requiring such optimization as the only device which imposes such non-coherency is Intel GPU which even doesn't support hotplug/hot remove. -- Thanks Kevin
On Wed, Jun 15, 2022 at 07:35:00AM +0000, Tian, Kevin wrote: > External email: Use caution opening links or attachments > > > > From: Nicolin Chen <nicolinc@nvidia.com> > > Sent: Wednesday, June 15, 2022 4:45 AM > > > > Hi Kevin, > > > > On Wed, Jun 08, 2022 at 11:48:27PM +0000, Tian, Kevin wrote: > > > > > > The KVM mechanism for controlling wbinvd is only triggered during > > > > > > kvm_vfio_group_add(), meaning it is a one-shot test done once the > > > > devices > > > > > > are setup. > > > > > > > > > > It's not one-shot. kvm_vfio_update_coherency() is called in both > > > > > group_add() and group_del(). Then the coherency property is > > > > > checked dynamically in wbinvd emulation: > > > > > > > > From the perspective of managing the domains that is still > > > > one-shot. It doesn't get updated when individual devices are > > > > added/removed to domains. > > > > > > It's unchanged per-domain but dynamic per-vm when multiple > > > domains are added/removed (i.e. kvm->arch.noncoherent_dma_count). > > > It's the latter being checked in the kvm. > > > > I am going to send a v2, yet not quite getting the point here. > > Meanwhile, Jason is on leave. > > > > What, in your opinion, would be an accurate description here? > > > > Something like below: > -- > The KVM mechanism for controlling wbinvd is based on OR of > the coherency property of all devices attached to a guest, no matter > those devices are attached to a single domain or multiple domains. > > So, there is no value in trying to push a device that could do enforced > cache coherency to a dedicated domain vs re-using an existing domain > which is non-coherent since KVM won't be able to take advantage of it. > This just wastes domain memory. > > Simplify this code and eliminate the test. This removes the only logic > that needed to have a dummy domain attached prior to searching for a > matching domain and simplifies the next patches. > > It's unclear whether we want to further optimize the Intel driver to > update the domain coherency after a device is detached from it, at > least not before KVM can be verified to handle such dynamics in related > emulation paths (wbinvd, vcpu load, write_cr0, ept, etc.). In reality > we don't see an usage requiring such optimization as the only device > which imposes such non-coherency is Intel GPU which even doesn't > support hotplug/hot remove. Thanks! I just updated that and will send v2.
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index c13b9290e357..f4e3b423a453 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2285,9 +2285,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, * testing if they're on the same bus_type. */ list_for_each_entry(d, &iommu->domain_list, next) { - if (d->domain->ops == domain->domain->ops && - d->enforce_cache_coherency == - domain->enforce_cache_coherency) { + if (d->domain->ops == domain->domain->ops) { iommu_detach_group(domain->domain, group->iommu_group); if (!iommu_attach_group(d->domain, group->iommu_group)) {