Message ID | cover.1746757630.git.nicolinc@nvidia.com |
---|---|
Headers | show |
Series | iommufd: Add vIOMMU infrastructure (Part-4 HW QUEUE) | expand |
On Thu, May 08, 2025 at 08:02:27PM -0700, Nicolin Chen wrote: > An IOMMU driver that allocated a vIOMMU may want to revert the allocation, > if it encounters an internal error after the allocation. So, there needs a > destroy helper for drivers to use. For instance: > > static my_viommu_alloc() > { > ... > my_viommu = iommufd_viommu_alloc(viomm, struct my_viommu, core); > ... > ret = init_my_viommu(); > if (ret) { > /* Need to destroy the my_viommu->core */ > return ERR_PTR(ret); > } > return &my_viommu->core; > } > > Move iommufd_object_abort() to the driver.c file and the public header, to > introduce common iommufd_struct_destroy() helper that will abort all kinds > of driver structures, not confined to iommufd_viommu but also the new ones > being added in the future. > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > --- > drivers/iommu/iommufd/iommufd_private.h | 1 - > include/linux/iommufd.h | 16 ++++++++++++++++ > drivers/iommu/iommufd/driver.c | 14 ++++++++++++++ > drivers/iommu/iommufd/main.c | 13 ------------- > 4 files changed, 30 insertions(+), 14 deletions(-) One idea that struck me when I was looking at this was to copy the technique from rdma. When an object is allocated we keep track of it in the struct iommufd_ucmd. Then when the command is over the core code either aborts or finalizes the objects in the iommufd_ucmd. This way you don't have to expose abort and related to drivers. Something like this: diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index 2d2695e2562d02..289dd19eec90f1 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -36,6 +36,16 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, } EXPORT_SYMBOL_NS_GPL(_iommufd_object_alloc, "IOMMUFD"); +struct iommufd_object *_iommufd_object_alloc_ucmd(struct iommufd_ucmd *ucmd, + size_t size, + enum iommufd_object_type type) +{ + if (ucmd->alloced_obj) + return ERR_PTR(-EBUSY); + ucmd->alloced_obj = _iommufd_object_alloc(ucmd->ictx, size, type); + return ucmd->alloced_obj; +} + /* Undo _iommufd_object_alloc() if iommufd_object_finalize() was not called */ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) { diff --git a/drivers/iommu/iommufd/eventq.c b/drivers/iommu/iommufd/eventq.c index f39cf079734769..f109948a81ac8c 100644 --- a/drivers/iommu/iommufd/eventq.c +++ b/drivers/iommu/iommufd/eventq.c @@ -473,7 +473,7 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) if (cmd->flags) return -EOPNOTSUPP; - fault = __iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT, + fault = __iommufd_object_alloc_ucmd(ucmd, fault, IOMMUFD_OBJ_FAULT, common.obj); if (IS_ERR(fault)) return PTR_ERR(fault); @@ -483,10 +483,8 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) fdno = iommufd_eventq_init(&fault->common, "[iommufd-pgfault]", ucmd->ictx, &iommufd_fault_fops); - if (fdno < 0) { - rc = fdno; - goto out_abort; - } + if (fdno < 0) + return fdno; cmd->out_fault_id = fault->common.obj.id; cmd->out_fault_fd = fdno; @@ -494,7 +492,6 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); if (rc) goto out_put_fdno; - iommufd_object_finalize(ucmd->ictx, &fault->common.obj); fd_install(fdno, fault->common.filep); @@ -502,9 +499,6 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) out_put_fdno: put_unused_fd(fdno); fput(fault->common.filep); -out_abort: - iommufd_object_abort_and_destroy(ucmd->ictx, &fault->common.obj); - return rc; } diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index c87326d7ecfccb..94cafae86d271c 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -152,6 +152,7 @@ struct iommufd_ucmd { void __user *ubuffer; u32 user_size; void *cmd; + struct iommufd_object *alloced_obj; }; int iommufd_vfio_ioctl(struct iommufd_ctx *ictx, unsigned int cmd, @@ -258,6 +259,15 @@ iommufd_object_put_and_try_destroy(struct iommufd_ctx *ictx, #define iommufd_object_alloc(ictx, ptr, type) \ __iommufd_object_alloc(ictx, ptr, type, obj) +#define __iommufd_object_alloc_uctx(uctx, ptr, type, obj) \ + container_of(_iommufd_object_alloc_ucmd( \ + uctx, \ + sizeof(*(ptr)) + BUILD_BUG_ON_ZERO( \ + offsetof(typeof(*(ptr)), \ + obj) != 0), \ + type), \ + typeof(*(ptr)), obj) + /* * The IO Address Space (IOAS) pagetable is a virtual page table backed by the * io_pagetable object. It is a user controlled mapping of IOVA -> PFNs. The diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 1d7f3584aea0f7..0bc9c02f6bfc4f 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -408,6 +408,14 @@ static long iommufd_fops_ioctl(struct file *filp, unsigned int cmd, if (ret) return ret; ret = op->execute(&ucmd); + + if (ucmd.alloced_obj) { + if (ret) + iommufd_object_abort_and_destroy(ictx, + ucmd.alloced_obj); + else + iommufd_object_finalize(ictx, ucmd.alloced_obj); + } return ret; }
On Thu, May 08, 2025 at 08:02:37PM -0700, Nicolin Chen wrote: > With the introduction of the new object and its infrastructure, update the > doc to reflect that. > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > --- > Documentation/userspace-api/iommufd.rst | 12 ++++++++++++ > 1 file changed, 12 insertions(+) Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Friday, May 16, 2025 12:47 AM > > On Thu, May 08, 2025 at 08:02:35PM -0700, Nicolin Chen wrote: > > + /* vma->vm_pgoff carries an index to an mtree entry (immap) */ > > + immap = mtree_load(&ictx->mt_mmap, vma->vm_pgoff); > > + if (!immap) > > + return -ENXIO; > > + if (length >> PAGE_SHIFT != immap->num_pfns) > > + return -ENXIO; > > This needs to validate that vm_pgoff is at the start of the immap or > num_pfns is the wrong thing to validate length against. > vm_pgoff is the index into mtree. If it's wrong mtree_load() will fail already?
On Fri, May 16, 2025 at 10:29:56AM -0300, Jason Gunthorpe wrote: > On Fri, May 16, 2025 at 04:08:25AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Friday, May 16, 2025 12:47 AM > > > > > > On Thu, May 08, 2025 at 08:02:35PM -0700, Nicolin Chen wrote: > > > > + /* vma->vm_pgoff carries an index to an mtree entry (immap) */ > > > > + immap = mtree_load(&ictx->mt_mmap, vma->vm_pgoff); > > > > + if (!immap) > > > > + return -ENXIO; > > > > + if (length >> PAGE_SHIFT != immap->num_pfns) > > > > + return -ENXIO; > > > > > > This needs to validate that vm_pgoff is at the start of the immap or > > > num_pfns is the wrong thing to validate length against. > > > > > > > vm_pgoff is the index into mtree. If it's wrong mtree_load() will > > fail already? > > I'm not sure? I thought mtree_load will return any range that > intersects with the given index? > > Otherwise what is the point of having a range based datastructure? Yea, I can confirm that providing a vm_pgoff that's in the range (though not the startp) could get immap too. I am adding negative test coverage for the vm_pgoff/length input for the following ifs: + /* vma->vm_pgoff carries an index to an mtree entry (immap) */ + immap = mtree_load(&ictx->mt_mmap, vma->vm_pgoff); + if (!immap) + return -ENXIO; + /* Validate the vm_pgoff and length against the registered region */ + if (vma->vm_pgoff != immap->startp) + return -ENXIO; + if (length != immap->num_pfns << PAGE_SHIFT) + return -ENXIO; Thanks Nicolin
Kevin, Nicolin, On 5/16/2025 8:29 AM, Tian, Kevin wrote: >> From: Nicolin Chen <nicolinc@nvidia.com> >> Sent: Friday, May 16, 2025 10:30 AM >> >> On Thu, May 15, 2025 at 05:58:41AM +0000, Tian, Kevin wrote: >>>> From: Nicolin Chen <nicolinc@nvidia.com> >>>> Sent: Friday, May 9, 2025 11:03 AM >>>> >>>> Add IOMMUFD_OBJ_HW_QUEUE with an iommufd_hw_queue structure, >>>> representing >>>> a HW-accelerated queue type of IOMMU's physical queue that can be >> passed >>>> through to a user space VM for direct hardware control, such as: >>>> - NVIDIA's Virtual Command Queue >>>> - AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer >>>> >>>> Introduce an allocator iommufd_hw_queue_alloc(). And add a pair of >>>> viommu >>>> ops for iommufd to forward user space ioctls to IOMMU drivers. >>>> >>>> Given that the first user of this HW QUEUE (tegra241-cmdqv) will need to >>>> ensure the queue memory to be physically contiguous, add a flag >> property >>>> in iommufd_viommu_ops and >>>> IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA to allow >>>> driver to flag it so that the core will validate the physical pages of a >>>> given guest queue. >>> >>> 'READS' is confusing here. What about xxx_CONTIG_PAS? >> >> Combining Jason's first comments here: >> https://lore.kernel.org/linux- >> iommu/20250515160620.GJ382960@nvidia.com/ >> >> So, pinning should be optional too. And I think there would be >> unlikely a case where HW needs contiguous physical pages while >> not requiring to pin the pages, right? AMD IOMMU needs contiguous GPA space for buffer (like command buffer), not contiguous physical address. >> >> So, we need an flag that could indicate to do both tests. Yet, >> "xxx_CONTIG_PAS" doesn't sound very fitting, compared to this >> "IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA". >> >> Perhaps, we should just add some comments to clarify a bit. Or >> do you have some better naming? >> > > let's wait until that open is closed, i.e. whether we still let the core > manage it and whether AMD requires pinning even when IOVA > is used. I think we may still want to pin those buffer address. -Vasant