Message ID | 0-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com |
---|---|
Headers | show |
Series | Initial support for SMMUv3 nested translation | expand |
On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote: > [This is now based on Nicolin's iommufd patches for vIOMMU and will need > to go through the iommufd tree, please ack] Can't we separate out the SMMUv3 driver changes? They shouldn't depend on Nicolin's work afaict. Will
On Fri, Nov 01, 2024 at 12:18:50PM +0000, Will Deacon wrote: > On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote: > > [This is now based on Nicolin's iommufd patches for vIOMMU and will need > > to go through the iommufd tree, please ack] > > Can't we separate out the SMMUv3 driver changes? They shouldn't depend on > Nicolin's work afaict. We can put everything before "iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC" directly on a rc and share a branch with your tree. That patch and following all depend on Nicolin's work, as they rely on the VIOMMU due to how different ARM is from Intel. How about you take these patches: [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface Onto a branch. I'll take these patches after merging your branch and Nicolin's: [PATCH v4 08/12] iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC [PATCH v4 09/12] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED [PATCH v4 10/12] iommu/arm-smmu-v3: Use S2FWB for NESTED domains [PATCH v4 11/12] iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED [PATCH v4 12/12] iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object ? I can also probably push most of S2FWB and ATS into the first batch. Please let me know, I would like this to be done this cycle, Nicolin's vIOMMU series are all reviewed now. Jason
On Wed, Oct 30, 2024 at 09:20:49PM -0300, Jason Gunthorpe wrote: > From: Nicolin Chen <nicolinc@nvidia.com> > > For virtualization cases the IDR/IIDR/AIDR values of the actual SMMU > instance need to be available to the VMM so it can construct an > appropriate vSMMUv3 that reflects the correct HW capabilities. > > For userspace page tables these values are required to constrain the valid > values within the CD table and the IOPTEs. > > The kernel does not sanitize these values. If building a VMM then > userspace is required to only forward bits into a VM that it knows it can > implement. Some bits will also require a VMM to detect if appropriate > kernel support is available such as for ATS and BTM. > > Start a new file and kconfig for the advanced iommufd support. This lets > it be compiled out for kernels that are not intended to support > virtualization, and allows distros to leave it disabled until they are > shipping a matching qemu too. > > Tested-by: Nicolin Chen <nicolinc@nvidia.com> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> > Reviewed-by: Donald Dutile <ddutile@redhat.com> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> --->8 > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h > index e266dfa6a38d9d..b227ac16333fe1 100644 > --- a/include/uapi/linux/iommufd.h > +++ b/include/uapi/linux/iommufd.h > @@ -488,15 +488,50 @@ struct iommu_hw_info_vtd { > __aligned_u64 ecap_reg; > }; > > +/** > + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information > + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3) > + * > + * @flags: Must be set to 0 > + * @__reserved: Must be 0 > + * @idr: Implemented features for ARM SMMU Non-secure programming interface > + * @iidr: Information about the implementation and implementer of ARM SMMU, > + * and architecture version supported > + * @aidr: ARM SMMU architecture version > + * > + * For the details of @idr, @iidr and @aidr, please refer to the chapters > + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec. > + * > + * User space should read the underlying ARM SMMUv3 hardware information for > + * the list of supported features. > + * > + * Note that these values reflect the raw HW capability, without any insight if > + * any required kernel driver support is present. Bits may be set indicating the > + * HW has functionality that is lacking kernel software support, such as BTM. If > + * a VMM is using this information to construct emulated copies of these > + * registers it should only forward bits that it knows it can support. > + * > + * In future, presence of required kernel support will be indicated in flags. What about the case where we _know_ that some functionality is broken in the hardware? For example, we nobble BTM support on MMU 700 thanks to erratum #2812531 yet we'll still cheerfully advertise it in IDR0 here. Similarly, HTTU can be overridden by IORT, so should we update the view that we advertise for that as well? Will
On Mon, Nov 04, 2024 at 11:47:24AM +0000, Will Deacon wrote: > > +/** > > + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information > > + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3) > > + * > > + * @flags: Must be set to 0 > > + * @__reserved: Must be 0 > > + * @idr: Implemented features for ARM SMMU Non-secure programming interface > > + * @iidr: Information about the implementation and implementer of ARM SMMU, > > + * and architecture version supported > > + * @aidr: ARM SMMU architecture version > > + * > > + * For the details of @idr, @iidr and @aidr, please refer to the chapters > > + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec. > > + * > > + * User space should read the underlying ARM SMMUv3 hardware information for > > + * the list of supported features. > > + * > > + * Note that these values reflect the raw HW capability, without any insight if > > + * any required kernel driver support is present. Bits may be set indicating the > > + * HW has functionality that is lacking kernel software support, such as BTM. If > > + * a VMM is using this information to construct emulated copies of these > > + * registers it should only forward bits that it knows it can support. > > + * > > + * In future, presence of required kernel support will be indicated in flags. > > What about the case where we _know_ that some functionality is broken in > the hardware? For example, we nobble BTM support on MMU 700 thanks to > erratum #2812531 yet we'll still cheerfully advertise it in IDR0 here. > Similarly, HTTU can be overridden by IORT, so should we update the view > that we advertise for that as well? My knee jerk answer is no, these struct fields should just report the raw HW register. A VMM should not copy these fields directly into a VM. The principle purpose is to give the VMM the same details about the HW as the kernel so it can apply erratas/etc. For instance, if we hide these fields how will the VMM/VM know to apply the various flushing errata? With vCMDQ/etc the VM is directly pushing flushes to HW, it must know the errata. For BTM/HTTU/etc - those all require kernel SW support and per-device permission in the kernel to turn on. For instance requesting a nested vSTE that needs BTM will fail today during attach. Turning on HTTU on the S2 already has an API that will fail if the IORT blocks it. Incrementally dealing with expanding the support is part of the "required kernel support will be indicated in flags." Basically, exposing the information as-is doesn't do any harm. Jason
On Thu, Oct 31, 2024 at 02:21:11PM +0800, Zhangfei Gao wrote: > > +static struct iommu_domain * > > +arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, > > + const struct iommu_user_data *user_data) > > +{ > > + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); > > + struct arm_smmu_nested_domain *nested_domain; > > + struct iommu_hwpt_arm_smmuv3 arg; > > + int ret; > > + > > + if (flags) > > + return ERR_PTR(-EOPNOTSUPP); > > This check fails when using user page fault, with flags = > IOMMU_HWPT_FAULT_ID_VALID (4) > Strange, the check is not exist in last version? > > iommufd_viommu_alloc_hwpt_nested -> > viommu->ops->alloc_domain_nested(viommu, flags, user_data) -> > arm_vsmmu_alloc_domain_nested It should permit IOMMU_HWPT_FAULT_ID_VALID, I'll add this hunk: --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -178,12 +178,18 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, const struct iommu_user_data *user_data) { struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); + const u32 SUPPORTED_FLAGS = IOMMU_HWPT_FAULT_ID_VALID; struct arm_smmu_nested_domain *nested_domain; struct iommu_hwpt_arm_smmuv3 arg; bool enable_ats = false; int ret; - if (flags) + /* + * Faults delivered to the nested domain are faults that originated by + * the S1 in the domain. The core code will match all PASIDs when + * delivering the fault due to user_pasid_table + */ + if (flags & ~SUPPORTED_FLAGS) return ERR_PTR(-EOPNOTSUPP); ret = iommu_copy_struct_from_user(&arg, user_data, Thanks, Jason
Hi Jason, On Fri, Nov 01, 2024 at 10:25:03AM -0300, Jason Gunthorpe wrote: > On Fri, Nov 01, 2024 at 12:18:50PM +0000, Will Deacon wrote: > > On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote: > > > [This is now based on Nicolin's iommufd patches for vIOMMU and will need > > > to go through the iommufd tree, please ack] > > > > Can't we separate out the SMMUv3 driver changes? They shouldn't depend on > > Nicolin's work afaict. > > We can put everything before "iommu/arm-smmu-v3: Support > IOMMU_VIOMMU_ALLOC" directly on a rc and share a branch with your tree. > > That patch and following all depend on Nicolin's work, as they rely on > the VIOMMU due to how different ARM is from Intel. > > How about you take these patches: > > [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU > [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f > [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag > [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS > [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info > [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT > [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface > > Onto a branch. I've pushed these onto a new branch in the IOMMU tree: iommufd/arm-smmuv3-nested However, please can you give it a day or two to get some exposure in -next before you merge that into iommufd? I'll ping back here later in the week. Cheers, Will
On Tue, Nov 05, 2024 at 04:48:29PM +0000, Will Deacon wrote: > Hi Jason, > > On Fri, Nov 01, 2024 at 10:25:03AM -0300, Jason Gunthorpe wrote: > > On Fri, Nov 01, 2024 at 12:18:50PM +0000, Will Deacon wrote: > > > On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote: > > > > [This is now based on Nicolin's iommufd patches for vIOMMU and will need > > > > to go through the iommufd tree, please ack] > > > > > > Can't we separate out the SMMUv3 driver changes? They shouldn't depend on > > > Nicolin's work afaict. > > > > We can put everything before "iommu/arm-smmu-v3: Support > > IOMMU_VIOMMU_ALLOC" directly on a rc and share a branch with your tree. > > > > That patch and following all depend on Nicolin's work, as they rely on > > the VIOMMU due to how different ARM is from Intel. > > > > How about you take these patches: > > > > [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU > > [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f > > [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag > > [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS > > [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info > > [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT > > [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface > > > > Onto a branch. > > I've pushed these onto a new branch in the IOMMU tree: > > iommufd/arm-smmuv3-nested > > However, please can you give it a day or two to get some exposure in > -next before you merge that into iommufd? I'll ping back here later in > the week. Thanks, I will hope to put the other parts together on Friday then so it can also get its time in -next, as we are running out of days before the merge window 0-day is still complaining about some kconfig in Nicolin's patches so we need to settle that anyhow. Jason
On Mon, 4 Nov 2024 at 17:19, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Thu, Oct 31, 2024 at 02:21:11PM +0800, Zhangfei Gao wrote: > > > > +static struct iommu_domain * > > > +arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, > > > + const struct iommu_user_data *user_data) > > > +{ > > > + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); > > > + struct arm_smmu_nested_domain *nested_domain; > > > + struct iommu_hwpt_arm_smmuv3 arg; > > > + int ret; > > > + > > > + if (flags) > > > + return ERR_PTR(-EOPNOTSUPP); > > > > This check fails when using user page fault, with flags = > > IOMMU_HWPT_FAULT_ID_VALID (4) > > Strange, the check is not exist in last version? > > > > iommufd_viommu_alloc_hwpt_nested -> > > viommu->ops->alloc_domain_nested(viommu, flags, user_data) -> > > arm_vsmmu_alloc_domain_nested > > It should permit IOMMU_HWPT_FAULT_ID_VALID, I'll add this hunk: > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c > @@ -178,12 +178,18 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, > const struct iommu_user_data *user_data) > { > struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); > + const u32 SUPPORTED_FLAGS = IOMMU_HWPT_FAULT_ID_VALID; > struct arm_smmu_nested_domain *nested_domain; > struct iommu_hwpt_arm_smmuv3 arg; > bool enable_ats = false; > int ret; > > - if (flags) > + /* > + * Faults delivered to the nested domain are faults that originated by > + * the S1 in the domain. The core code will match all PASIDs when > + * delivering the fault due to user_pasid_table > + */ > + if (flags & ~SUPPORTED_FLAGS) > return ERR_PTR(-EOPNOTSUPP); Thanks Jason, this works
On 2024-11-04 12:41 pm, Jason Gunthorpe wrote: > On Mon, Nov 04, 2024 at 11:47:24AM +0000, Will Deacon wrote: >>> +/** >>> + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information >>> + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3) >>> + * >>> + * @flags: Must be set to 0 >>> + * @__reserved: Must be 0 >>> + * @idr: Implemented features for ARM SMMU Non-secure programming interface >>> + * @iidr: Information about the implementation and implementer of ARM SMMU, >>> + * and architecture version supported >>> + * @aidr: ARM SMMU architecture version >>> + * >>> + * For the details of @idr, @iidr and @aidr, please refer to the chapters >>> + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec. >>> + * >>> + * User space should read the underlying ARM SMMUv3 hardware information for >>> + * the list of supported features. >>> + * >>> + * Note that these values reflect the raw HW capability, without any insight if >>> + * any required kernel driver support is present. Bits may be set indicating the >>> + * HW has functionality that is lacking kernel software support, such as BTM. If >>> + * a VMM is using this information to construct emulated copies of these >>> + * registers it should only forward bits that it knows it can support. But how *is* a VMM supposed to know what it can support? Are they all expected to grovel the host devicetree/ACPI tables and maintain their own knowledge of implementation errata to understand what's actually usable? >>> + * >>> + * In future, presence of required kernel support will be indicated in flags. >> >> What about the case where we _know_ that some functionality is broken in >> the hardware? For example, we nobble BTM support on MMU 700 thanks to >> erratum #2812531 yet we'll still cheerfully advertise it in IDR0 here. >> Similarly, HTTU can be overridden by IORT, so should we update the view >> that we advertise for that as well? > > My knee jerk answer is no, these struct fields should just report the > raw HW register. A VMM should not copy these fields directly into a > VM. The principle purpose is to give the VMM the same details about the > HW as the kernel so it can apply erratas/etc. > > For instance, if we hide these fields how will the VMM/VM know to > apply the various flushing errata? With vCMDQ/etc the VM is directly > pushing flushes to HW, it must know the errata. That doesn't seem like a valid argument. We obviously can't abstract SMMU_IIDR, that would indeed be an invitation for trouble, but otherwise, if an erratum affects S1 operation under conditions dependent on an optional feature, then not advertising that feature would make the workaround irrelevant anyway, since as far as the VM is concerned it would be wrong to expect a non-existent feature to work in the first place. > For BTM/HTTU/etc - those all require kernel SW support and per-device > permission in the kernel to turn on. For instance requesting a nested > vSTE that needs BTM will fail today during attach. Turning on HTTU on > the S2 already has an API that will fail if the IORT blocks it. What does S2 HTTU have to do with the VM? How the host wants to maintain its S2 tables it its own business. AFAICS, unless the VMM wants to do some fiddly CD shadowing, it's going to be kinda hard to prevent the SMMU seeing a guest CD with CD.HA and/or CD.HD set if the guest expects S1 HTTU to work. I'm not sure what "vSTE that needs BTM" means. Even if the system does support BTM, the only control is the global SMMU_CR2.PTM, and a vSMMU can't usefully emulate changing that either way. Either the host set PTM=0 before enabling the SMMU, so BTM can be advertised and expected to work, or it didn't, in which case there can be no BTM, full stop. > Incrementally dealing with expanding the support is part of the > "required kernel support will be indicated in flags." > > Basically, exposing the information as-is doesn't do any harm. I would say it does. Advertising a feature when we already know it's not usable at all puts a non-trivial and unnecessary burden on the VMM and VM to then have to somehow derive that information from other sources, at the risk of being confused by unexpected behaviour if they don't. We sanitise CPU ID registers for userspace and KVM, so I see no compelling reason for SMMU ID registers to be different. Thanks, Robin.
On Wed, Nov 06, 2024 at 04:37:53PM +0000, Robin Murphy wrote: > On 2024-11-04 12:41 pm, Jason Gunthorpe wrote: > > On Mon, Nov 04, 2024 at 11:47:24AM +0000, Will Deacon wrote: > > > > +/** > > > > + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information > > > > + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3) > > > > + * > > > > + * @flags: Must be set to 0 > > > > + * @__reserved: Must be 0 > > > > + * @idr: Implemented features for ARM SMMU Non-secure programming interface > > > > + * @iidr: Information about the implementation and implementer of ARM SMMU, > > > > + * and architecture version supported > > > > + * @aidr: ARM SMMU architecture version > > > > + * > > > > + * For the details of @idr, @iidr and @aidr, please refer to the chapters > > > > + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec. > > > > + * > > > > + * User space should read the underlying ARM SMMUv3 hardware information for > > > > + * the list of supported features. > > > > + * > > > > + * Note that these values reflect the raw HW capability, without any insight if > > > > + * any required kernel driver support is present. Bits may be set indicating the > > > > + * HW has functionality that is lacking kernel software support, such as BTM. If > > > > + * a VMM is using this information to construct emulated copies of these > > > > + * registers it should only forward bits that it knows it can support. > > But how *is* a VMM supposed to know what it can support? I answered a related question to Mostafa with an example: https://lore.kernel.org/linux-iommu/20240903235532.GJ3773488@nvidia.com/ "global" capabilities that are enabled directly from the CD entry would follow the pattern. > Are they all expected to grovel the host devicetree/ACPI tables and > maintain their own knowledge of implementation errata to understand > what's actually usable? No, VMMs are expected to only implement base line features we have working today and not blindly add new features based only HW registers reported here. Each future capability we want to enable at the VMM needs an analysis: 1) Does it require kernel SW changes, ie like BTM? Then it needs a kernel_capabilities bit to say the kernel SW exists 2) Does it require data from ACPI/DT/etc? Then it needs a kernel_capabilities bit 3) Does it need to be "turned on" per VM, ie with a VMS enablement? Then it needs a new request flag in ALLOC_VIOMMU 4) Otherwise it can be read directly from the idr[] array This is why the comment above is so stern that the VMM "should only forward bits that it knows it can support". > S2 tables it its own business. AFAICS, unless the VMM wants to do some > fiddly CD shadowing, it's going to be kinda hard to prevent the SMMU seeing > a guest CD with CD.HA and/or CD.HD set if the guest expects S1 HTTU to work. If the VMM wrongly indicates HTTU support to the VM, because it wrongly inspected those bits in the idr report, then it is just broken. > I would say it does. Advertising a feature when we already know it's not > usable at all puts a non-trivial and unnecessary burden on the VMM and VM to > then have to somehow derive that information from other sources, at the risk > of being confused by unexpected behaviour if they don't. That is not the purpose here, the register report is not to be used as "advertising features". It describes details of the raw HW that the VMM may need to use *some* of the fields. There are quite a few fields that fit #4 today: OAS, VAX, GRAN, BBML, CD2L, etc. Basically we will pass most of the bits and mask a few. If we get the masking wrong and pass something we shouldn't, then we've improved nothing compared to this proposal. I think we are likely to get the masking wrong :) > We sanitise CPU ID registers for userspace and KVM, so I see no compelling > reason for SMMU ID registers to be different. We discussed this already: https://lore.kernel.org/linux-iommu/20240904120103.GB3915968@nvidia.com It is a false comparison, for KVM the kernel is responsible to control the CPU ID registers. Reporting the registers the VM sees to the VMM makes alot of sense. For SMMU the VMM exclusively controls the VM's ID registers. If you still feel strongly about this please let me know by Friday and I will drop the idr[] array from this cycle. We can continue to discuss a solution for the next cycle. Regards, Jason
On 2024-11-06 6:05 pm, Jason Gunthorpe wrote: > On Wed, Nov 06, 2024 at 04:37:53PM +0000, Robin Murphy wrote: >> On 2024-11-04 12:41 pm, Jason Gunthorpe wrote: >>> On Mon, Nov 04, 2024 at 11:47:24AM +0000, Will Deacon wrote: >>>>> +/** >>>>> + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information >>>>> + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3) >>>>> + * >>>>> + * @flags: Must be set to 0 >>>>> + * @__reserved: Must be 0 >>>>> + * @idr: Implemented features for ARM SMMU Non-secure programming interface >>>>> + * @iidr: Information about the implementation and implementer of ARM SMMU, >>>>> + * and architecture version supported >>>>> + * @aidr: ARM SMMU architecture version >>>>> + * >>>>> + * For the details of @idr, @iidr and @aidr, please refer to the chapters >>>>> + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec. >>>>> + * >>>>> + * User space should read the underlying ARM SMMUv3 hardware information for >>>>> + * the list of supported features. >>>>> + * >>>>> + * Note that these values reflect the raw HW capability, without any insight if >>>>> + * any required kernel driver support is present. Bits may be set indicating the >>>>> + * HW has functionality that is lacking kernel software support, such as BTM. If >>>>> + * a VMM is using this information to construct emulated copies of these >>>>> + * registers it should only forward bits that it knows it can support. >> >> But how *is* a VMM supposed to know what it can support? > > I answered a related question to Mostafa with an example: > > https://lore.kernel.org/linux-iommu/20240903235532.GJ3773488@nvidia.com/ > > "global" capabilities that are enabled directly from the CD entry > would follow the pattern. > >> Are they all expected to grovel the host devicetree/ACPI tables and >> maintain their own knowledge of implementation errata to understand >> what's actually usable? > > No, VMMs are expected to only implement base line features we have > working today and not blindly add new features based only HW registers > reported here. > > Each future capability we want to enable at the VMM needs an analysis: > > 1) Does it require kernel SW changes, ie like BTM? Then it needs a > kernel_capabilities bit to say the kernel SW exists > 2) Does it require data from ACPI/DT/etc? Then it needs a > kernel_capabilities bit > 3) Does it need to be "turned on" per VM, ie with a VMS enablement? > Then it needs a new request flag in ALLOC_VIOMMU > 4) Otherwise it can be read directly from the idr[] array > > This is why the comment above is so stern that the VMM "should only > forward bits that it knows it can support". So... you're saying this patch is in fact broken, or at least uselessly incomplete, since VMMs aren't allowed to emulate a vSMMU at all without first consulting some other interface which does not exist? Great. >> S2 tables it its own business. AFAICS, unless the VMM wants to do some >> fiddly CD shadowing, it's going to be kinda hard to prevent the SMMU seeing >> a guest CD with CD.HA and/or CD.HD set if the guest expects S1 HTTU to work. > > If the VMM wrongly indicates HTTU support to the VM, because it > wrongly inspected those bits in the idr report, then it is just > broken. What do you mean? We could have a system right now where the hardware is configured with SMMU_IDR0.HTTU=2, but it turned out that atomics were broken in the interconnect so firmware sets the IORT "HTTU override" field is set to 0. We know about that in the kernel, but all a VMM sees is iommu_hw_info_arm_smmuv3.idr[0] indicating HTTU=2. If it is "broken" to take the only information available at face value, assume HTTU is available, and reflect that in a vSMMU interface, then what is the correct thing to do, other than to not dare emulate a vSMMU at all, in fear of a sternly worded comment? >> I would say it does. Advertising a feature when we already know it's not >> usable at all puts a non-trivial and unnecessary burden on the VMM and VM to >> then have to somehow derive that information from other sources, at the risk >> of being confused by unexpected behaviour if they don't. > > That is not the purpose here, the register report is not to be used as > "advertising features". It describes details of the raw HW that the > VMM may need to use *some* of the fields. > > There are quite a few fields that fit #4 today: OAS, VAX, GRAN, BBML, > CD2L, etc. > > Basically we will pass most of the bits and mask a few. If we get the > masking wrong and pass something we shouldn't, then we've improved > nothing compared to this proposal. I think we are likely to get the > masking wrong :) Seriously? A simple inverse of the feature detection the kernel driver already does for its own needs, implemented once in the same place, is hard? Compared to maintaining the exact same information within the driver but in some new different form, and also maintaining it in the UAPI, and having every VMM ever all do the same work to put the two together, and always be up to date with the right UAPI, and never ever let any field slip through as-is, especially not all the ones which were RES0 at time of writing, enforced by a sternly worded comment? Why yes, of course I can see how that's trivially easy and carries no risk whatsoever. >> We sanitise CPU ID registers for userspace and KVM, so I see no compelling >> reason for SMMU ID registers to be different. > > We discussed this already: > > https://lore.kernel.org/linux-iommu/20240904120103.GB3915968@nvidia.com > > It is a false comparison, for KVM the kernel is responsible to control > the CPU ID registers. Reporting the registers the VM sees to the VMM > makes alot of sense. For SMMU the VMM exclusively controls the VM's ID > registers. Pointing out that two things are different is a false comparison because they are different, by virtue of your choice to make them different? Please try making sense. Your tautology still does not offer any reasoning against doing the logical thing and following the same basic pattern: the kernel uses the ID register mechanism itself to advertise the set of features it's able/willing to support, by sanitising the values it offers to the VMM, combining the notions of hardware and kernel support where the distinction is irrelevant anyway. The VMM is then still free to take those values and hide more features, or potentially add any that it is capable of emulating without the kernel's help, and advertise that final set to the VM. Obviously there are significant *implementation* differences, most notably that the latter VMM->VM part doesn't need to involve IOMMUFD at all since MMIO register emulation can stay entirely in userspace, whereas for CPU system registers the final VM-visible values need to be plugged back in to KVM for it to handle the traps. We are all asking you to explain why you think doing the kernel->VMM advertisement naturally and intuitively is somehow bad, and forcing VMMs to instead rely on a more complex, fragile, and crucially non-existent additional interface is better. You should take "We discussed this already" as more of a clue to yourself than to me - if 4 different people have all said the exact same thing in so many words, perhaps there's something in it... And in case I need to spell it out with less sarcasm, "we'll get masking wrong in the kernel" only implies "we'll get kernel_capabilities wrong in the kernel (and elsewhere)", so it's clearly not a useful argument to keep repeating. Besides, as KVM + sysfs + MRS emulation shows, we're pretty experienced at masking ID registers in the kernel. It's not hard to do it right in a robust manner, where particularly with the nature of SMMU features, the only real risk might be forgetting to expose something new once we do actually support it. > If you still feel strongly about this please let me know by Friday and > I will drop the idr[] array from this cycle. We can continue to > discuss a solution for the next cycle. It already can't work as-is, I don't see how making it even more broken would help. IMO it doesn't seem like a good idea to be merging UAPI at all while it's still clearly incomplete and by its own definition unusable. Thanks, Robin.
On Wed, Nov 06, 2024 at 09:05:26PM +0000, Robin Murphy wrote: > On 2024-11-06 6:05 pm, Jason Gunthorpe wrote: > > If you still feel strongly about this please let me know by Friday and > > I will drop the idr[] array from this cycle. We can continue to > > discuss a solution for the next cycle. > > It already can't work as-is, I don't see how making it even more broken > would help. IMO it doesn't seem like a good idea to be merging UAPI at > all while it's still clearly incomplete and by its own definition unusable. Robin, would you please give a clear suggestion for the hw_info? My takeaway is that you would want the unsupported features (per firmware overrides and errata) to be stripped from the reporting IDR array. Alternatively, we could start with some basic nesting features less those advanced ones (HTTU/PRI or so), and then add then later once we're comfortable to advertise. Does this sound okay to you? Thanks Nicolin
On Wed, Nov 06, 2024 at 09:05:26PM +0000, Robin Murphy wrote: > > Each future capability we want to enable at the VMM needs an analysis: > > > > 1) Does it require kernel SW changes, ie like BTM? Then it needs a > > kernel_capabilities bit to say the kernel SW exists > > 2) Does it require data from ACPI/DT/etc? Then it needs a > > kernel_capabilities bit > > 3) Does it need to be "turned on" per VM, ie with a VMS enablement? > > Then it needs a new request flag in ALLOC_VIOMMU > > 4) Otherwise it can be read directly from the idr[] array > > > > This is why the comment above is so stern that the VMM "should only > > forward bits that it knows it can support". > > So... you're saying this patch is in fact broken, or at least uselessly > incomplete, since VMMs aren't allowed to emulate a vSMMU at all without > first consulting some other interface which does not exist? Great. That is too far, there is nothing fundamentally broken here. This does not enable every SMMU feature in the architecture. It implements a basic baseline and no more. The VMMs are only permitted to read bits in that baseline. VMMs that want to go beyond that have to go through the 4 steps above. Fundamentally all we are arguing about here is if bits the VMM shouldn't even read should be forced to zero by the kernel. If the bits are forced to zero then perhaps they could be used as a future SW feature negotiation, and maybe that would be better, or maybe not. See below about BTM for a counter example. I agree the kdoc does not describe what the baseline actually is. > We know about that in the kernel, but all a VMM sees is > iommu_hw_info_arm_smmuv3.idr[0] indicating HTTU=2. If it is "broken" to take > the only information available at face value, assume HTTU is available, and > reflect that in a vSMMU interface, then what is the correct thing to do, > other than to not dare emulate a vSMMU at all, in fear of a sternly worded > comment? These patches support a baseline feature set. They do not support the entire SMMU architecture. They do not support vHTTU. Things are going step by step because this is a huge project. HTTU is not in the baseline, so it is definitely wrong for any VMM working on these patches to advertise support to the VM! The VMM *MUST* ignore such bits in the IDR report. The correct thing is the 4 steps I outlined above. When a VMM author wants to go beyond the baseline they have to determine what kernel and VMM software is required and implement the missing parts. > Your tautology still does not offer any reasoning against doing the logical > thing and following the same basic pattern: the kernel uses the ID register > mechanism itself to advertise the set of features it's able/willing to > support, by sanitising the values it offers to the VMM, combining the > notions of hardware and kernel support where the distinction is irrelevant > anyway. Even with your plan the VMM can not pass the kernel's IDR register into the VM unprotected. It must always sanitize it against the features that the VMM supports. Let's consider vBTM support. This requires kernel support to enable the global BTM register *AND* a new VMM ioctl flow with iommufd to link to KVM's VMID. You are suggesting that the old kernel should wire IDR BTM to 0 and the new kernel will set it to 1. However the VMM cannot just forward this bit to the VM! A old VMM that does not do the KVM flow does NOT support BTM. All VMM's must mask the BTM bit from day 0, or we can never set it to 1 in a future kernel. Which is exactly the same plan as this patch! > You should take "We discussed this already" > as more of a clue to yourself than to me - if 4 different people have all > said the exact same thing in so many words, perhaps there's something in > it... And all seemed to agree it was not a big deal after the discussion. I think Mostafa was driving in a direction that we break up the IDR into explicit fields and thus be explicit about what information the VMM is able to access. This would effectively document and enforce what the baseline is. > And in case I need to spell it out with less sarcasm, "we'll get masking > wrong in the kernel" only implies "we'll get kernel_capabilities wrong in > the kernel (and elsewhere)", Less sarcasm and hyperbole would be nice. > > If you still feel strongly about this please let me know by Friday and > > I will drop the idr[] array from this cycle. We can continue to > > discuss a solution for the next cycle. > > It already can't work as-is, As above, this is not true. > I don't see how making it even more broken > would help. IMO it doesn't seem like a good idea to be merging UAPI at all > while it's still clearly incomplete and by its own definition unusable. I agree that removing the idr would make it definitely unusable. There is quite a bit of essential information in there the VMM needs. However, the uAPI is extensible so adding a new field in the next cycle is not breaking. I'm trying to offer you an olive branch so we can have this discussion with time instead of urgently at the 11th hour and derailing the series. This exact topic has been discussed already two months ago, and is, IMHO, mostly a style choice not an urgent technical matter. If you have another option we can conclude in the next few days then I'm happy to hear it. If we focus only on the baseline functionality Nicolin can come up with a list of fields and we can write it very explicitly in the kdoc that the VMM can only read those fields. I could also reluctantly agree to permanent masking to report only the above kdoc bits in the kernel. Noting if you want this because you don't trust the VMM authors to follow an explicit kdoc, then we likely also cannot ever set some bits to 1. Eg BTM would be permanently 0. Regards, Jason
On Wed, Nov 06, 2024 at 10:35:07PM -0400, Jason Gunthorpe wrote:
> I agree the kdoc does not describe what the baseline actually is.
Nicolin worked on this, here is a more detailed kdoc:
/**
* struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
* (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
*
* @flags: Must be set to 0
* @__reserved: Must be 0
* @idr: Implemented features for ARM SMMU Non-secure programming interface
* @iidr: Information about the implementation and implementer of ARM SMMU,
* and architecture version supported
* @aidr: ARM SMMU architecture version
*
* For the details of @idr, @iidr and @aidr, please refer to the chapters
* from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
*
* This reports the raw HW capability, and not all bits are meaningful to be
* read by userspace. Only the following fields should be used:
*
* idr[0]: ST_LEVEL, TERM_MODEL, STALL_MODEL, TTENDIAN , CD2L, ASID16, TTF
* idr[1]: SIDSIZE, SSIDSIZE
* idr[3]: BBML, RIL
* idr[5]: VAX, GRAN64K, GRAN16K, GRAN4K
*
* - S1P should be assumed to be true if a NESTED HWPT can be created
* - VFIO/iommufd only support platforms with COHACC, it should be assumed to be
* true.
* - ATS is a per-device property. If the VMM describes any devices as ATS
* capable in ACPI/DT it should set the corresponding idr.
*
* This list may expand in future (eg E0PD, AIE, PBHA, D128, DS etc). It is
* important that VMMs do not read bits outside the list to allow for
* compatibility with future kernels. Several features in the SMMUv3
* architecture are not currently supported by the kernel for nesting: HTTU,
* BTM, MPAM and others.
*/
This focuses on stuff we can actually test and gives a path to test
and confirm a no-code update to future stuff.
The future list (E0PD/etc) reflects things the current kernel doesn't
use. Our naive read of the spec suggests they are probably fine to
just read the raw HW IDR. When someone implements guest kernel
support, does a detailed spec read, and crucially tests them - then we
can update the comment and have immediate support.
HTTU, BTM, and others like that will need additional bits outside the
IDR if someone wishes to enable them.
Jason
Hi guys, On Wed, Nov 06, 2024 at 10:35:06PM -0400, Jason Gunthorpe wrote: > On Wed, Nov 06, 2024 at 09:05:26PM +0000, Robin Murphy wrote: > > You should take "We discussed this already" > > as more of a clue to yourself than to me - if 4 different people have all > > said the exact same thing in so many words, perhaps there's something in > > it... > > And all seemed to agree it was not a big deal after the discussion. > > I think Mostafa was driving in a direction that we break up the IDR > into explicit fields and thus be explicit about what information the > VMM is able to access. This would effectively document and enforce > what the baseline is. As one of the four people mentioned above, I figured I'd chime in with my rationale for queuing this in case it's of any help or interest. Initially, I was reasonably sure that we should be sanitising the ID registers and being selective about what we advertise to userspace. However, after Jason's reply to my comments, mulling it over in my head and having lively conversations with Mostafa at lunchtime, I've come full circle. Is it a great interface? Not at all. It's 8 register values copied from the hardware to userspace. But is it good enough? I think it is. It's also extremely simple (i.e. easy to explain what it does and trivial to implement), which I think is a huge benefit given that the IOMMUFD work around it is still evolving. I'm firmly of the opinion that the VMM is going to need a tonne of help from other sources to expose a virtual IOMMU successfully. For example, anything relating to policy or configuration choices should be driven from userspace rather than the kernel. If we start to expose policy in the id registers for the cases where it happens to fit nicely, I fear that this will backfire and the VMM will end up second-guessing the kernel in cases where it decides it knows best. I'm not sold on the analogy with CPU ID registers as (a) we don't have big/little idiocy to deal with in the SMMU and (b) I don't think SMMU features always need support code in the host, which is quite unlike many CPU features that cannot be exposed safely without hypervisor context-switching support. On top of all that, this interface can be extended if we change our minds. If we decide we need to mask out fields, I think we could add that after the fact. Hell, if we all decide that it's a disaster in a few releases time, we can try something else. Ultimately, Jason is the one maintaining IOMMUFD and he gets to deal with the problems if his UAPI doesn't work out :) So, given where we are in the cycle, I think the pragmatic thing to do is to land this change now and enable the ongoing IOMMUFD work to continue. We still have over two months to resolve any major problems with the interface (and even unplug it entirely from the driver if we really get stuck) but for now I think it's "fine". Will
On Tue, Nov 05, 2024 at 04:48:29PM +0000, Will Deacon wrote: > On Fri, Nov 01, 2024 at 10:25:03AM -0300, Jason Gunthorpe wrote: > > On Fri, Nov 01, 2024 at 12:18:50PM +0000, Will Deacon wrote: > > > On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote: > > > > [This is now based on Nicolin's iommufd patches for vIOMMU and will need > > > > to go through the iommufd tree, please ack] > > > > > > Can't we separate out the SMMUv3 driver changes? They shouldn't depend on > > > Nicolin's work afaict. > > > > We can put everything before "iommu/arm-smmu-v3: Support > > IOMMU_VIOMMU_ALLOC" directly on a rc and share a branch with your tree. > > > > That patch and following all depend on Nicolin's work, as they rely on > > the VIOMMU due to how different ARM is from Intel. > > > > How about you take these patches: > > > > [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU > > [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f > > [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag > > [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS > > [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info > > [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT > > [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface > > > > Onto a branch. > > I've pushed these onto a new branch in the IOMMU tree: > > iommufd/arm-smmuv3-nested > > However, please can you give it a day or two to get some exposure in > -next before you merge that into iommufd? I'll ping back here later in > the week. Not seen anything go bang in -next, so I think we can consider this branch stable now. Will
On Fri, Nov 08, 2024 at 02:53:22PM +0000, Will Deacon wrote: > So, given where we are in the cycle, I think the pragmatic thing to do > is to land this change now and enable the ongoing IOMMUFD work to > continue. We still have over two months to resolve any major problems > with the interface (and even unplug it entirely from the driver if we > really get stuck) but for now I think it's "fine". Thanks Will, this all eloquently captures my line of reasoning also. I will add that after going through the exercise with Nicolin, to write down all the fields the VMM is allowed to touch, it seems we do have a list of future fields that likely do not require any host support (E0PD, AIE, PBHA, D128, DS). We also have ones that will (BTM) and then the fwspec weirdo of HTTU. Given those ratios I feel pretty good about showing that data to the userspace, expecting that down the road we will "turn it on" for E0PD/etc without any kernel change, and BTM/HTTU will not use idr for discovery. The biggest risk is the VMM's badly muck it up. I now think it was a lazy shortcut to not enumerate all the fields in the kdoc. Now that we've done that we see that the (unreviewed) RFC qemu patches did miss some things and it is being corrected. Nicolin and I had some discussion if we should go ahead and include E0PD/etc in the documentation right now, but we both felt some uncertainty that we really understood those features deeply enough to be confident, and have no way to test. There is also the nanny option of zeroing everything not in the approved list, but I feel like it is already so hard to write a VMM vSMMU3 that is too nannyish. Hoepfully time will not show that my opinion of VMM authors is too high. :\ Jason
On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote: > Jason Gunthorpe (7): > iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED > iommu/arm-smmu-v3: Use S2FWB for NESTED domains > iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED > > Nicolin Chen (5): > iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC > iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object Applied to iommufd for-next along with all the dependencies and the additional hunk Zhangfei pointed out. Thanks, Jason
On Wed, 13 Nov 2024 at 02:29, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote: > > Jason Gunthorpe (7): > > iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED > > iommu/arm-smmu-v3: Use S2FWB for NESTED domains > > iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED > > > > Nicolin Chen (5): > > iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC > > iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object > > Applied to iommufd for-next along with all the dependencies and the > additional hunk Zhangfei pointed out. Thanks Jason, I have verified on aarch64 based on your jason/smmuv3_nesting branch https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip https://github.com/Linaro/qemu/tree/6.12-wip Still need this hack https://github.com/Linaro/linux-kernel-uadk/commit/eaa194d954112cad4da7852e29343e546baf8683 One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA, which you have patchset before. The other is to temporarily ignore S2FWB or CANWBS. Thanks
On Wed, Nov 13, 2024 at 09:01:41AM +0800, Zhangfei Gao wrote: > On Wed, 13 Nov 2024 at 02:29, Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote: > > > Jason Gunthorpe (7): > > > iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED > > > iommu/arm-smmu-v3: Use S2FWB for NESTED domains > > > iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED > > > > > > Nicolin Chen (5): > > > iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC > > > iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object > > > > Applied to iommufd for-next along with all the dependencies and the > > additional hunk Zhangfei pointed out. > > Thanks Jason, I have verified on aarch64 based on your > jason/smmuv3_nesting branch Great, thanks > https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip > https://github.com/Linaro/qemu/tree/6.12-wip > > Still need this hack > https://github.com/Linaro/linux-kernel-uadk/commit/eaa194d954112cad4da7852e29343e546baf8683 > > One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA, > which you have patchset before. Yes, I have a more complete version of that here someplace. Need some help on vt-d but hope to get that done next cycle. > The other is to temporarily ignore S2FWB or CANWBS. Yes, ideally you should do that in the device FW, or perhaps via the Linux ACPI patching? Jason
On 11/13/24 09:23, Jason Gunthorpe wrote: >> https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip >> https://github.com/Linaro/qemu/tree/6.12-wip >> >> Still need this hack >> https://github.com/Linaro/linux-kernel-uadk/commit/ >> eaa194d954112cad4da7852e29343e546baf8683 >> >> One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA, >> which you have patchset before. > Yes, I have a more complete version of that here someplace. Need some > help on vt-d but hope to get that done next cycle. Can you please elaborate this a bit more? Are you talking about below change + ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA); + if (ret) + return ret; in iommufd_fault_iopf_enable()? I have no idea about why SVA is affected when enabling iopf. -- baolu
Hi, Baolu On Wed, 13 Nov 2024 at 10:56, Baolu Lu <baolu.lu@linux.intel.com> wrote: > > On 11/13/24 09:23, Jason Gunthorpe wrote: > >> https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip > >> https://github.com/Linaro/qemu/tree/6.12-wip > >> > >> Still need this hack > >> https://github.com/Linaro/linux-kernel-uadk/commit/ > >> eaa194d954112cad4da7852e29343e546baf8683 > >> > >> One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA, > >> which you have patchset before. > > Yes, I have a more complete version of that here someplace. Need some > > help on vt-d but hope to get that done next cycle. > > Can you please elaborate this a bit more? Are you talking about below > change > > + ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA); > + if (ret) > + return ret; > > in iommufd_fault_iopf_enable()? > > I have no idea about why SVA is affected when enabling iopf. In drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c, iommu_dev_enable_feature(IOMMU_DEV_FEAT_SVA) will real call iopf_queue_add_device, while iommu_dev_enable_feature(IOPF) only set flag. arm_smmu_dev_enable_feature case IOMMU_DEV_FEAT_SVA: arm_smmu_master_enable_sva(master) iopf_queue_add_device(master->smmu->evtq.iopf, dev); Thanks
On Wed, Nov 13, 2024 at 10:55:41AM +0800, Baolu Lu wrote: > On 11/13/24 09:23, Jason Gunthorpe wrote: > > > https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip > > > https://github.com/Linaro/qemu/tree/6.12-wip > > > > > > Still need this hack > > > https://github.com/Linaro/linux-kernel-uadk/commit/ > > > eaa194d954112cad4da7852e29343e546baf8683 > > > > > > One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA, > > > which you have patchset before. > > Yes, I have a more complete version of that here someplace. Need some > > help on vt-d but hope to get that done next cycle. > > Can you please elaborate this a bit more? Are you talking about below > change I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel driver. I have a patch series that eliminates it from all the other drivers, and I wrote a patch to remove FEAT_SVA from intel.. > + ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA); > + if (ret) > + return ret; > > in iommufd_fault_iopf_enable()? > > I have no idea about why SVA is affected when enabling iopf. It is ARM not implementing the API correctly. Only SVA turns on the page fault reporting mechanism. In the new world the page fault reporting should be managed during domain attachment. If the domain is fault capable then faults should be delivered to that domain. That is the correct time to setup the iopf mechanism as well. So I fixed that and now ARM and AMD both have no-op implementations of IOMMU_DEV_FEAT_IOPF and IOMMU_DEV_FEAT_SVA. Thus I'd like to remove it entirely. Jason
On 11/14/24 00:43, Jason Gunthorpe wrote: > On Wed, Nov 13, 2024 at 10:55:41AM +0800, Baolu Lu wrote: >> On 11/13/24 09:23, Jason Gunthorpe wrote: >>>> https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip >>>> https://github.com/Linaro/qemu/tree/6.12-wip >>>> >>>> Still need this hack >>>> https://github.com/Linaro/linux-kernel-uadk/commit/ >>>> eaa194d954112cad4da7852e29343e546baf8683 >>>> >>>> One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA, >>>> which you have patchset before. >>> Yes, I have a more complete version of that here someplace. Need some >>> help on vt-d but hope to get that done next cycle. >> >> Can you please elaborate this a bit more? Are you talking about below >> change > > I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel > driver. I have a patch series that eliminates it from all the other > drivers, and I wrote a patch to remove FEAT_SVA from intel.. Yes, sure. Let's make this happen in the next cycle. FEAT_IOPF could be removed. IOPF manipulation can be handled in the domain attachment path. A per-device refcount can be implemented. This count increments with each iopf-capable domain attachment and decrements with each detachment. PCI PRI is enabled for the first iopf-capable domain and disabled when the last one is removed. Probably we can also solve the PF/VF sharing PRI issue. With iopf moved to the domain attach path and hardware capability checks to the SVA domain allocation path, FEAT_SVA becomes essentially a no-op. > >> + ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA); >> + if (ret) >> + return ret; >> >> in iommufd_fault_iopf_enable()? >> >> I have no idea about why SVA is affected when enabling iopf. > > It is ARM not implementing the API correctly. Only SVA turns on the > page fault reporting mechanism. > > In the new world the page fault reporting should be managed during > domain attachment. If the domain is fault capable then faults should > be delivered to that domain. That is the correct time to setup the > iopf mechanism as well. > > So I fixed that and now ARM and AMD both have no-op implementations of > IOMMU_DEV_FEAT_IOPF and IOMMU_DEV_FEAT_SVA. Thus I'd like to remove it > entirely. Thank you for the explanation. -- baolu
On Thu, Nov 14, 2024 at 08:51:47AM +0800, Baolu Lu wrote: > On 11/14/24 00:43, Jason Gunthorpe wrote: > > On Wed, Nov 13, 2024 at 10:55:41AM +0800, Baolu Lu wrote: > > > On 11/13/24 09:23, Jason Gunthorpe wrote: > > > > > https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip > > > > > https://github.com/Linaro/qemu/tree/6.12-wip > > > > > > > > > > Still need this hack > > > > > https://github.com/Linaro/linux-kernel-uadk/commit/ > > > > > eaa194d954112cad4da7852e29343e546baf8683 > > > > > > > > > > One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA, > > > > > which you have patchset before. > > > > Yes, I have a more complete version of that here someplace. Need some > > > > help on vt-d but hope to get that done next cycle. > > > > > > Can you please elaborate this a bit more? Are you talking about below > > > change > > > > I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel > > driver. I have a patch series that eliminates it from all the other > > drivers, and I wrote a patch to remove FEAT_SVA from intel.. > > Yes, sure. Let's make this happen in the next cycle. > > FEAT_IOPF could be removed. IOPF manipulation can be handled in the > domain attachment path. A per-device refcount can be implemented. This > count increments with each iopf-capable domain attachment and decrements > with each detachment. PCI PRI is enabled for the first iopf-capable > domain and disabled when the last one is removed. Probably we can also > solve the PF/VF sharing PRI issue. Here is what I have so far, if you send me a patch for vt-d to move FEAT_IOPF into attach as you describe above (see what I did to arm for example), then I can send it next cycle https://github.com/jgunthorpe/linux/commits/iommu_no_feat/ Thanks, Jason
On Fri, Nov 15, 2024 at 01:55:22PM -0400, Jason Gunthorpe wrote: > > > I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel > > > driver. I have a patch series that eliminates it from all the other > > > drivers, and I wrote a patch to remove FEAT_SVA from intel.. > > > > Yes, sure. Let's make this happen in the next cycle. > > > > FEAT_IOPF could be removed. IOPF manipulation can be handled in the > > domain attachment path. A per-device refcount can be implemented. This > > count increments with each iopf-capable domain attachment and decrements > > with each detachment. PCI PRI is enabled for the first iopf-capable > > domain and disabled when the last one is removed. Probably we can also > > solve the PF/VF sharing PRI issue. > > Here is what I have so far, if you send me a patch for vt-d to move > FEAT_IOPF into attach as you describe above (see what I did to arm for > example), then I can send it next cycle > > https://github.com/jgunthorpe/linux/commits/iommu_no_feat/ Hey Baolu, a reminder on this, lets try for it next cycle? Thanks, Jason