Message ID | 6348cc7a72ce9f2ac0e9caf9737e70177a01eb74.1724776335.git.nicolinc@nvidia.com |
---|---|
State | New |
Headers | show |
Series | iommufd: Add VIOMMU infrastructure (Part-1) | expand |
On Tue, Aug 27, 2024 at 09:59:43AM -0700, Nicolin Chen wrote: > Introduce a pair of new ioctls to set/unset a per-viommu virtual device id > that should be linked to a physical device id via an idev pointer. Given some of the other discussions around CC I suspect we should rename these to 'create/destroy virtual device' with an eye that eventually they would be extended like other ops with per-CC platform data. ie this would be the interface to tell the CC trusted world that a secure device is being added to a VM with some additional flags.. Right now it only conveys the vRID parameter of the virtual device being created. A following question is if these objects should have their own IDs in the iommufd space too, and then unset is not unset but just a normal destroy object. If so then the thing you put in the ids xarray would also just be a normal object struct. This is probably worth doing if this is going to grow more CC stuff later. Jason
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Friday, September 6, 2024 4:15 AM > > On Thu, Sep 05, 2024 at 02:43:26PM -0300, Jason Gunthorpe wrote: > > On Thu, Sep 05, 2024 at 10:37:45AM -0700, Nicolin Chen wrote: > > > That being said, if we have a clear picture that in the long term > > > we would extend it to hold more information, I think it could be > > > a smart move. > > > > > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > > > would you still prefer attaching via proxy hwpt_nested? > > > > I was thinking just creating it against a vIOMMU is an effective > > "attach" and the virtual device is permanently tied to the vIOMMU at > > creation time. > > Ah, right! The create is per-viommu, so it's being attached. > presumably we also need check compatibility between the idev which the virtual device is created against and the stage-2 pgtable, as a normal attach required?
On Wed, Sep 11, 2024 at 06:19:10AM +0000, Tian, Kevin wrote: > > From: Nicolin Chen <nicolinc@nvidia.com> > > Sent: Friday, September 6, 2024 4:15 AM > > > > On Thu, Sep 05, 2024 at 02:43:26PM -0300, Jason Gunthorpe wrote: > > > On Thu, Sep 05, 2024 at 10:37:45AM -0700, Nicolin Chen wrote: > > > > That being said, if we have a clear picture that in the long term > > > > we would extend it to hold more information, I think it could be > > > > a smart move. > > > > > > > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > > > > would you still prefer attaching via proxy hwpt_nested? > > > > > > I was thinking just creating it against a vIOMMU is an effective > > > "attach" and the virtual device is permanently tied to the vIOMMU at > > > creation time. > > > > Ah, right! The create is per-viommu, so it's being attached. > > > > presumably we also need check compatibility between the idev > which the virtual device is created against and the stage-2 pgtable, > as a normal attach required? If that's required, it can be a part of "create virtual device", where idev and viommu (holding s2 hwpt) would be all available? Thanks Nicolin
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Wednesday, September 11, 2024 3:12 PM > > On Wed, Sep 11, 2024 at 06:19:10AM +0000, Tian, Kevin wrote: > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > Sent: Friday, September 6, 2024 4:15 AM > > > > > > On Thu, Sep 05, 2024 at 02:43:26PM -0300, Jason Gunthorpe wrote: > > > > On Thu, Sep 05, 2024 at 10:37:45AM -0700, Nicolin Chen wrote: > > > > > That being said, if we have a clear picture that in the long term > > > > > we would extend it to hold more information, I think it could be > > > > > a smart move. > > > > > > > > > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > > > > > would you still prefer attaching via proxy hwpt_nested? > > > > > > > > I was thinking just creating it against a vIOMMU is an effective > > > > "attach" and the virtual device is permanently tied to the vIOMMU at > > > > creation time. > > > > > > Ah, right! The create is per-viommu, so it's being attached. > > > > > > > presumably we also need check compatibility between the idev > > which the virtual device is created against and the stage-2 pgtable, > > as a normal attach required? > > If that's required, it can be a part of "create virtual device", > where idev and viommu (holding s2 hwpt) would be all available? > yes
On Wed, Sep 11, 2024 at 07:18:52AM +0000, Tian, Kevin wrote: > > From: Nicolin Chen <nicolinc@nvidia.com> > > Sent: Wednesday, September 11, 2024 3:12 PM > > > > On Wed, Sep 11, 2024 at 06:19:10AM +0000, Tian, Kevin wrote: > > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > > Sent: Friday, September 6, 2024 4:15 AM > > > > > > > > On Thu, Sep 05, 2024 at 02:43:26PM -0300, Jason Gunthorpe wrote: > > > > > On Thu, Sep 05, 2024 at 10:37:45AM -0700, Nicolin Chen wrote: > > > > > > That being said, if we have a clear picture that in the long term > > > > > > we would extend it to hold more information, I think it could be > > > > > > a smart move. > > > > > > > > > > > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > > > > > > would you still prefer attaching via proxy hwpt_nested? > > > > > > > > > > I was thinking just creating it against a vIOMMU is an effective > > > > > "attach" and the virtual device is permanently tied to the vIOMMU at > > > > > creation time. > > > > > > > > Ah, right! The create is per-viommu, so it's being attached. > > > > > > > > > > presumably we also need check compatibility between the idev > > > which the virtual device is created against and the stage-2 pgtable, > > > as a normal attach required? > > > > If that's required, it can be a part of "create virtual device", > > where idev and viommu (holding s2 hwpt) would be all available? > > > > yes Oh, I misread your question actually. I think it's about a matching validation between dev->iommu->iommu_dev and vIOMMU->iommu_dev. Thanks Nicolin
Hi Nicolin, On Tue, Aug 27, 2024 at 09:59:43AM -0700, Nicolin Chen wrote: > Introduce a pair of new ioctls to set/unset a per-viommu virtual device id > that should be linked to a physical device id via an idev pointer. > > Continue the support IOMMU_VIOMMU_TYPE_DEFAULT for a core-managed viommu. > Provide a lookup function for drivers to load device pointer by a virtual > device id. > > Add a rw_semaphore protection around the vdev_id list. Any future ioctl > handlers that potentially access the list must grab the lock too. > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > --- > drivers/iommu/iommufd/device.c | 12 +++ > drivers/iommu/iommufd/iommufd_private.h | 21 ++++ > drivers/iommu/iommufd/main.c | 6 ++ > drivers/iommu/iommufd/viommu.c | 121 ++++++++++++++++++++++++ > include/uapi/linux/iommufd.h | 40 ++++++++ > 5 files changed, 200 insertions(+) > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c > index 5fd3dd420290..3ad759971b32 100644 > --- a/drivers/iommu/iommufd/device.c > +++ b/drivers/iommu/iommufd/device.c > @@ -136,6 +136,18 @@ void iommufd_device_destroy(struct iommufd_object *obj) > struct iommufd_device *idev = > container_of(obj, struct iommufd_device, obj); > > + /* Unlocked since there should be no race in a destroy() */ > + if (idev->vdev_id) { > + struct iommufd_vdev_id *vdev_id = idev->vdev_id; > + struct iommufd_viommu *viommu = vdev_id->viommu; > + struct iommufd_vdev_id *old; > + > + old = xa_cmpxchg(&viommu->vdev_ids, vdev_id->id, vdev_id, NULL, > + GFP_KERNEL); > + WARN_ON(old != vdev_id); > + kfree(vdev_id); > + idev->vdev_id = NULL; > + } > iommu_device_release_dma_owner(idev->dev); > iommufd_put_group(idev->igroup); > if (!iommufd_selftest_is_mock_dev(idev->dev)) > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h > index 1f2a1c133b9a..2c6e168c5300 100644 > --- a/drivers/iommu/iommufd/iommufd_private.h > +++ b/drivers/iommu/iommufd/iommufd_private.h > @@ -416,6 +416,7 @@ struct iommufd_device { > struct iommufd_object obj; > struct iommufd_ctx *ictx; > struct iommufd_group *igroup; > + struct iommufd_vdev_id *vdev_id; > struct list_head group_item; > /* always the physical device */ > struct device *dev; > @@ -533,11 +534,31 @@ struct iommufd_viommu { > struct iommufd_ctx *ictx; > struct iommufd_hwpt_paging *hwpt; > > + /* The locking order is vdev_ids_rwsem -> igroup::lock */ > + struct rw_semaphore vdev_ids_rwsem; > + struct xarray vdev_ids; > + > unsigned int type; > }; > > +struct iommufd_vdev_id { > + struct iommufd_viommu *viommu; > + struct iommufd_device *idev; > + u64 id; > +}; > + > +static inline struct iommufd_viommu * > +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) > +{ > + return container_of(iommufd_get_object(ucmd->ictx, id, > + IOMMUFD_OBJ_VIOMMU), > + struct iommufd_viommu, obj); > +} > + > int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); > void iommufd_viommu_destroy(struct iommufd_object *obj); > +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd); > +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd); > > #ifdef CONFIG_IOMMUFD_TEST > int iommufd_test(struct iommufd_ucmd *ucmd); > diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c > index 288ee51b6829..199ad90fa36b 100644 > --- a/drivers/iommu/iommufd/main.c > +++ b/drivers/iommu/iommufd/main.c > @@ -334,6 +334,8 @@ union ucmd_buffer { > struct iommu_option option; > struct iommu_vfio_ioas vfio_ioas; > struct iommu_viommu_alloc viommu; > + struct iommu_viommu_set_vdev_id set_vdev_id; > + struct iommu_viommu_unset_vdev_id unset_vdev_id; > #ifdef CONFIG_IOMMUFD_TEST > struct iommu_test_cmd test; > #endif > @@ -387,6 +389,10 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { > __reserved), > IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, > struct iommu_viommu_alloc, out_viommu_id), > + IOCTL_OP(IOMMU_VIOMMU_SET_VDEV_ID, iommufd_viommu_set_vdev_id, > + struct iommu_viommu_set_vdev_id, vdev_id), > + IOCTL_OP(IOMMU_VIOMMU_UNSET_VDEV_ID, iommufd_viommu_unset_vdev_id, > + struct iommu_viommu_unset_vdev_id, vdev_id), > #ifdef CONFIG_IOMMUFD_TEST > IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), > #endif > diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c > index 200653a4bf57..8ffcd72b16b8 100644 > --- a/drivers/iommu/iommufd/viommu.c > +++ b/drivers/iommu/iommufd/viommu.c > @@ -8,6 +8,15 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) > { > struct iommufd_viommu *viommu = > container_of(obj, struct iommufd_viommu, obj); > + struct iommufd_vdev_id *vdev_id; > + unsigned long index; > + > + xa_for_each(&viommu->vdev_ids, index, vdev_id) { > + /* Unlocked since there should be no race in a destroy() */ > + vdev_id->idev->vdev_id = NULL; > + kfree(vdev_id); > + } > + xa_destroy(&viommu->vdev_ids); > > refcount_dec(&viommu->hwpt->common.obj.users); > } > @@ -53,6 +62,9 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) > viommu->ictx = ucmd->ictx; > viommu->hwpt = hwpt_paging; > > + xa_init(&viommu->vdev_ids); > + init_rwsem(&viommu->vdev_ids_rwsem); > + > refcount_inc(&viommu->hwpt->common.obj.users); > > cmd->out_viommu_id = viommu->obj.id; > @@ -70,3 +82,112 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) > iommufd_put_object(ucmd->ictx, &idev->obj); > return rc; > } > + > +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) > +{ > + struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd; > + struct iommufd_vdev_id *vdev_id, *curr; > + struct iommufd_viommu *viommu; > + struct iommufd_device *idev; > + int rc = 0; > + > + if (cmd->vdev_id > ULONG_MAX) > + return -EINVAL; > + > + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); > + if (IS_ERR(viommu)) > + return PTR_ERR(viommu); > + > + idev = iommufd_get_device(ucmd, cmd->dev_id); > + if (IS_ERR(idev)) { > + rc = PTR_ERR(idev); > + goto out_put_viommu; > + } > + > + down_write(&viommu->vdev_ids_rwsem); > + mutex_lock(&idev->igroup->lock); > + if (idev->vdev_id) { > + rc = -EEXIST; > + goto out_unlock_igroup; > + } > + > + vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL); > + if (!vdev_id) { > + rc = -ENOMEM; > + goto out_unlock_igroup; > + } > + > + vdev_id->idev = idev; > + vdev_id->viommu = viommu; > + vdev_id->id = cmd->vdev_id; My understanding of IOMMUFD is very little, but AFAICT, that means that it’s assumed that each device can only have one stream ID(RID)? As I can see in patch 17 in arm_smmu_convert_viommu_vdev_id(), it converts the virtual ID to a physical one using master->streams[0].id. Is that correct or am I missing something? As I am looking at similar problem for paravirtual IOMMU with pKVM, where the UAPI would be something similar to: GET_NUM_END_POINTS(dev) => nr_sids SET_END_POINT_VSID(dev, sid_index, vsid) Similar to what VFIO does with IRQs. As a device can have many SIDs. Thanks, Mostafa > + > + curr = xa_cmpxchg(&viommu->vdev_ids, cmd->vdev_id, NULL, vdev_id, > + GFP_KERNEL); > + if (curr) { > + rc = xa_err(curr) ? : -EBUSY; > + goto out_free; > + } > + > + idev->vdev_id = vdev_id; > + goto out_unlock_igroup; > + > +out_free: > + kfree(vdev_id); > +out_unlock_igroup: > + mutex_unlock(&idev->igroup->lock); > + up_write(&viommu->vdev_ids_rwsem); > + iommufd_put_object(ucmd->ictx, &idev->obj); > +out_put_viommu: > + iommufd_put_object(ucmd->ictx, &viommu->obj); > + return rc; > +} > + > +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd) > +{ > + struct iommu_viommu_unset_vdev_id *cmd = ucmd->cmd; > + struct iommufd_viommu *viommu; > + struct iommufd_vdev_id *old; > + struct iommufd_device *idev; > + int rc = 0; > + > + if (cmd->vdev_id > ULONG_MAX) > + return -EINVAL; > + > + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); > + if (IS_ERR(viommu)) > + return PTR_ERR(viommu); > + > + idev = iommufd_get_device(ucmd, cmd->dev_id); > + if (IS_ERR(idev)) { > + rc = PTR_ERR(idev); > + goto out_put_viommu; > + } > + > + down_write(&viommu->vdev_ids_rwsem); > + mutex_lock(&idev->igroup->lock); > + if (!idev->vdev_id) { > + rc = -ENOENT; > + goto out_unlock_igroup; > + } > + if (idev->vdev_id->id != cmd->vdev_id) { > + rc = -EINVAL; > + goto out_unlock_igroup; > + } > + > + old = xa_cmpxchg(&viommu->vdev_ids, idev->vdev_id->id, > + idev->vdev_id, NULL, GFP_KERNEL); > + if (xa_is_err(old)) { > + rc = xa_err(old); > + goto out_unlock_igroup; > + } > + kfree(old); > + idev->vdev_id = NULL; > + > +out_unlock_igroup: > + mutex_unlock(&idev->igroup->lock); > + up_write(&viommu->vdev_ids_rwsem); > + iommufd_put_object(ucmd->ictx, &idev->obj); > +out_put_viommu: > + iommufd_put_object(ucmd->ictx, &viommu->obj); > + return rc; > +} > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h > index 51ce6a019c34..1816e89c922d 100644 > --- a/include/uapi/linux/iommufd.h > +++ b/include/uapi/linux/iommufd.h > @@ -52,6 +52,8 @@ enum { > IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, > IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, > IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, > + IOMMUFD_CMD_VIOMMU_SET_VDEV_ID = 0x90, > + IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID = 0x91, > }; > > /** > @@ -882,4 +884,42 @@ struct iommu_viommu_alloc { > __u32 out_viommu_id; > }; > #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) > + > +/** > + * struct iommu_viommu_set_vdev_id - ioctl(IOMMU_VIOMMU_SET_VDEV_ID) > + * @size: sizeof(struct iommu_viommu_set_vdev_id) > + * @viommu_id: viommu ID to associate with the device to store its virtual ID > + * @dev_id: device ID to set its virtual ID > + * @__reserved: Must be 0 > + * @vdev_id: Virtual device ID > + * > + * Set a viommu-specific virtual ID of a device > + */ > +struct iommu_viommu_set_vdev_id { > + __u32 size; > + __u32 viommu_id; > + __u32 dev_id; > + __u32 __reserved; > + __aligned_u64 vdev_id; > +}; > +#define IOMMU_VIOMMU_SET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_SET_VDEV_ID) > + > +/** > + * struct iommu_viommu_unset_vdev_id - ioctl(IOMMU_VIOMMU_UNSET_VDEV_ID) > + * @size: sizeof(struct iommu_viommu_unset_vdev_id) > + * @viommu_id: viommu ID associated with the device to delete its virtual ID > + * @dev_id: device ID to unset its virtual ID > + * @__reserved: Must be 0 > + * @vdev_id: Virtual device ID (for verification) > + * > + * Unset a viommu-specific virtual ID of a device > + */ > +struct iommu_viommu_unset_vdev_id { > + __u32 size; > + __u32 viommu_id; > + __u32 dev_id; > + __u32 __reserved; > + __aligned_u64 vdev_id; > +}; > +#define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID) > #endif > -- > 2.43.0 >
On Fri, Sep 27, 2024 at 01:50:52PM +0000, Mostafa Saleh wrote: > My understanding of IOMMUFD is very little, but AFAICT, that means that > it’s assumed that each device can only have one stream ID(RID)? > > As I can see in patch 17 in arm_smmu_convert_viommu_vdev_id(), it > converts the virtual ID to a physical one using master->streams[0].id. > > Is that correct or am I missing something? > > As I am looking at similar problem for paravirtual IOMMU with pKVM, where > the UAPI would be something similar to: > > GET_NUM_END_POINTS(dev) => nr_sids > > SET_END_POINT_VSID(dev, sid_index, vsid) > > Similar to what VFIO does with IRQs. > > As a device can have many SIDs. We don't support multi SID through this interface, at least in this version. To do multi-sid you have to inform the VM of all the different pSIDs the device has and then setup the vSID/pSID translation to map them all to the HW invalidation logic. Which is alot more steps, and we have no use case right now. Multi-sid is also not something I expect to see in any modern PCI device, and this is VFIO PCI... Jason
On Fri, Sep 27, 2024 at 11:01:41AM -0300, Jason Gunthorpe wrote: > On Fri, Sep 27, 2024 at 01:50:52PM +0000, Mostafa Saleh wrote: > > > My understanding of IOMMUFD is very little, but AFAICT, that means that > > it’s assumed that each device can only have one stream ID(RID)? > > > > As I can see in patch 17 in arm_smmu_convert_viommu_vdev_id(), it > > converts the virtual ID to a physical one using master->streams[0].id. > > > > Is that correct or am I missing something? > > > > As I am looking at similar problem for paravirtual IOMMU with pKVM, where > > the UAPI would be something similar to: > > > > GET_NUM_END_POINTS(dev) => nr_sids > > > > SET_END_POINT_VSID(dev, sid_index, vsid) > > > > Similar to what VFIO does with IRQs. > > > > As a device can have many SIDs. > > We don't support multi SID through this interface, at least in this > version. > > To do multi-sid you have to inform the VM of all the different pSIDs > the device has and then setup the vSID/pSID translation to map them > all to the HW invalidation logic. Why would the VM need to know the pSID? The way I view this is quite close to how irq works, the VM only views the GSI which is the virtualized number. The VMM then would need to configure vSID->pSID translation, also without knowing the actual pSID, just how many SIDs are there per-device; very similar to how it configures IRQs through VFIO_DEVICE_GET_INFO/VFIO_DEVICE_SET_IRQS. And as long as we only allow 1:1 vSID to pSID mapping, I guess it would be easy to implement. > > Which is alot more steps, and we have no use case right now. Multi-sid > is also not something I expect to see in any modern PCI device, and > this is VFIO PCI... > Ah, I thought IOMMUFD would be used instead of VFIO_TYPE1*, which should cover platform devices (VFIO-platform) or am I missing something? And multi-SIDs is common in platform devices and this would be quite restricting, and I was hoping to support the pKVM vIOMMU through IOMMUFD interface. If possible, can the UAPI be designed with this in mind, even if not implemented now? Thanks, Mostafa > Jason
On Fri, Sep 27, 2024 at 02:22:20PM +0000, Mostafa Saleh wrote: > > We don't support multi SID through this interface, at least in this > > version. > > > > To do multi-sid you have to inform the VM of all the different pSIDs > > the device has and then setup the vSID/pSID translation to map them > > all to the HW invalidation logic. > > Why would the VM need to know the pSID? It doesn't need to know the pSID exactly, but it needs to know all the pSIDs that exist and have them be labeled with vSIDs. With cmdq direct assignment the VM has to issue an ATS invalidation for each and every physical device using its vSID. There is no HW path to handle a 1:N v/p SID relationship. > Ah, I thought IOMMUFD would be used instead of VFIO_TYPE1*, which should cover > platform devices (VFIO-platform) or am I missing something? It does work with platform, but AFAIK nobody is interested in that so it hasn't been any focus. Enabling multi-sid nesting, sub stream ids, etc is some additional work I think. But what I mean is the really hard use case for the vSID/pSID mapping is ATS invalidation and you won't use ATS invalidation on platform so multi-sid likely doesn't matter. STE/CD invalidation could possibly be pushed down through the per-domain ioctl and replicated to all domain attachments. We don't have code in the series to do that, but it could work from a uAPI perspective. > If possible, can the UAPI be designed with this in mind, even if not > implemented now? It is reasonable to ask. I think things are extensible enough. I'd imagine we can add a flag 'secondary ID' and then a new field 'secondary ID index' to the vdev operations when someone wants to take this on. Jason
On Fri, Sep 27, 2024 at 3:58 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Fri, Sep 27, 2024 at 02:22:20PM +0000, Mostafa Saleh wrote: > > > > We don't support multi SID through this interface, at least in this > > > version. > > > > > > To do multi-sid you have to inform the VM of all the different pSIDs > > > the device has and then setup the vSID/pSID translation to map them > > > all to the HW invalidation logic. > > > > Why would the VM need to know the pSID? > > It doesn't need to know the pSID exactly, but it needs to know all the > pSIDs that exist and have them be labeled with vSIDs. > > With cmdq direct assignment the VM has to issue an ATS invalidation > for each and every physical device using its vSID. There is no HW path > to handle a 1:N v/p SID relationship. > I see, that's for the cmdq assignment. > > Ah, I thought IOMMUFD would be used instead of VFIO_TYPE1*, which should cover > > platform devices (VFIO-platform) or am I missing something? > > It does work with platform, but AFAIK nobody is interested in that so > it hasn't been any focus. Enabling multi-sid nesting, sub stream ids, > etc is some additional work I think. > > But what I mean is the really hard use case for the vSID/pSID mapping > is ATS invalidation and you won't use ATS invalidation on platform so > multi-sid likely doesn't matter. > > STE/CD invalidation could possibly be pushed down through the > per-domain ioctl and replicated to all domain attachments. We don't > have code in the series to do that, but it could work from a uAPI > perspective. > > > If possible, can the UAPI be designed with this in mind, even if not > > implemented now? > > It is reasonable to ask. I think things are extensible enough. I'd > imagine we can add a flag 'secondary ID' and then a new field > 'secondary ID index' to the vdev operations when someone wants to take > this on. > Makes sense, I can take this when I start doing the pKVM work with IOMMUFD, in case it wasn't supported by then. Thanks, Mostafa > Jason
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5fd3dd420290..3ad759971b32 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -136,6 +136,18 @@ void iommufd_device_destroy(struct iommufd_object *obj) struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj); + /* Unlocked since there should be no race in a destroy() */ + if (idev->vdev_id) { + struct iommufd_vdev_id *vdev_id = idev->vdev_id; + struct iommufd_viommu *viommu = vdev_id->viommu; + struct iommufd_vdev_id *old; + + old = xa_cmpxchg(&viommu->vdev_ids, vdev_id->id, vdev_id, NULL, + GFP_KERNEL); + WARN_ON(old != vdev_id); + kfree(vdev_id); + idev->vdev_id = NULL; + } iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 1f2a1c133b9a..2c6e168c5300 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -416,6 +416,7 @@ struct iommufd_device { struct iommufd_object obj; struct iommufd_ctx *ictx; struct iommufd_group *igroup; + struct iommufd_vdev_id *vdev_id; struct list_head group_item; /* always the physical device */ struct device *dev; @@ -533,11 +534,31 @@ struct iommufd_viommu { struct iommufd_ctx *ictx; struct iommufd_hwpt_paging *hwpt; + /* The locking order is vdev_ids_rwsem -> igroup::lock */ + struct rw_semaphore vdev_ids_rwsem; + struct xarray vdev_ids; + unsigned int type; }; +struct iommufd_vdev_id { + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + u64 id; +}; + +static inline struct iommufd_viommu * +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_VIOMMU), + struct iommufd_viommu, obj); +} + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd); +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd); #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 288ee51b6829..199ad90fa36b 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -334,6 +334,8 @@ union ucmd_buffer { struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; struct iommu_viommu_alloc viommu; + struct iommu_viommu_set_vdev_id set_vdev_id; + struct iommu_viommu_unset_vdev_id unset_vdev_id; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -387,6 +389,10 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_VIOMMU_SET_VDEV_ID, iommufd_viommu_set_vdev_id, + struct iommu_viommu_set_vdev_id, vdev_id), + IOCTL_OP(IOMMU_VIOMMU_UNSET_VDEV_ID, iommufd_viommu_unset_vdev_id, + struct iommu_viommu_unset_vdev_id, vdev_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 200653a4bf57..8ffcd72b16b8 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -8,6 +8,15 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) { struct iommufd_viommu *viommu = container_of(obj, struct iommufd_viommu, obj); + struct iommufd_vdev_id *vdev_id; + unsigned long index; + + xa_for_each(&viommu->vdev_ids, index, vdev_id) { + /* Unlocked since there should be no race in a destroy() */ + vdev_id->idev->vdev_id = NULL; + kfree(vdev_id); + } + xa_destroy(&viommu->vdev_ids); refcount_dec(&viommu->hwpt->common.obj.users); } @@ -53,6 +62,9 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) viommu->ictx = ucmd->ictx; viommu->hwpt = hwpt_paging; + xa_init(&viommu->vdev_ids); + init_rwsem(&viommu->vdev_ids_rwsem); + refcount_inc(&viommu->hwpt->common.obj.users); cmd->out_viommu_id = viommu->obj.id; @@ -70,3 +82,112 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &idev->obj); return rc; } + +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd; + struct iommufd_vdev_id *vdev_id, *curr; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + int rc = 0; + + if (cmd->vdev_id > ULONG_MAX) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) { + rc = PTR_ERR(idev); + goto out_put_viommu; + } + + down_write(&viommu->vdev_ids_rwsem); + mutex_lock(&idev->igroup->lock); + if (idev->vdev_id) { + rc = -EEXIST; + goto out_unlock_igroup; + } + + vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL); + if (!vdev_id) { + rc = -ENOMEM; + goto out_unlock_igroup; + } + + vdev_id->idev = idev; + vdev_id->viommu = viommu; + vdev_id->id = cmd->vdev_id; + + curr = xa_cmpxchg(&viommu->vdev_ids, cmd->vdev_id, NULL, vdev_id, + GFP_KERNEL); + if (curr) { + rc = xa_err(curr) ? : -EBUSY; + goto out_free; + } + + idev->vdev_id = vdev_id; + goto out_unlock_igroup; + +out_free: + kfree(vdev_id); +out_unlock_igroup: + mutex_unlock(&idev->igroup->lock); + up_write(&viommu->vdev_ids_rwsem); + iommufd_put_object(ucmd->ictx, &idev->obj); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +} + +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_unset_vdev_id *cmd = ucmd->cmd; + struct iommufd_viommu *viommu; + struct iommufd_vdev_id *old; + struct iommufd_device *idev; + int rc = 0; + + if (cmd->vdev_id > ULONG_MAX) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) { + rc = PTR_ERR(idev); + goto out_put_viommu; + } + + down_write(&viommu->vdev_ids_rwsem); + mutex_lock(&idev->igroup->lock); + if (!idev->vdev_id) { + rc = -ENOENT; + goto out_unlock_igroup; + } + if (idev->vdev_id->id != cmd->vdev_id) { + rc = -EINVAL; + goto out_unlock_igroup; + } + + old = xa_cmpxchg(&viommu->vdev_ids, idev->vdev_id->id, + idev->vdev_id, NULL, GFP_KERNEL); + if (xa_is_err(old)) { + rc = xa_err(old); + goto out_unlock_igroup; + } + kfree(old); + idev->vdev_id = NULL; + +out_unlock_igroup: + mutex_unlock(&idev->igroup->lock); + up_write(&viommu->vdev_ids_rwsem); + iommufd_put_object(ucmd->ictx, &idev->obj); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +} diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 51ce6a019c34..1816e89c922d 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -52,6 +52,8 @@ enum { IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, + IOMMUFD_CMD_VIOMMU_SET_VDEV_ID = 0x90, + IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID = 0x91, }; /** @@ -882,4 +884,42 @@ struct iommu_viommu_alloc { __u32 out_viommu_id; }; #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) + +/** + * struct iommu_viommu_set_vdev_id - ioctl(IOMMU_VIOMMU_SET_VDEV_ID) + * @size: sizeof(struct iommu_viommu_set_vdev_id) + * @viommu_id: viommu ID to associate with the device to store its virtual ID + * @dev_id: device ID to set its virtual ID + * @__reserved: Must be 0 + * @vdev_id: Virtual device ID + * + * Set a viommu-specific virtual ID of a device + */ +struct iommu_viommu_set_vdev_id { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 __reserved; + __aligned_u64 vdev_id; +}; +#define IOMMU_VIOMMU_SET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_SET_VDEV_ID) + +/** + * struct iommu_viommu_unset_vdev_id - ioctl(IOMMU_VIOMMU_UNSET_VDEV_ID) + * @size: sizeof(struct iommu_viommu_unset_vdev_id) + * @viommu_id: viommu ID associated with the device to delete its virtual ID + * @dev_id: device ID to unset its virtual ID + * @__reserved: Must be 0 + * @vdev_id: Virtual device ID (for verification) + * + * Unset a viommu-specific virtual ID of a device + */ +struct iommu_viommu_unset_vdev_id { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 __reserved; + __aligned_u64 vdev_id; +}; +#define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID) #endif
Introduce a pair of new ioctls to set/unset a per-viommu virtual device id that should be linked to a physical device id via an idev pointer. Continue the support IOMMU_VIOMMU_TYPE_DEFAULT for a core-managed viommu. Provide a lookup function for drivers to load device pointer by a virtual device id. Add a rw_semaphore protection around the vdev_id list. Any future ioctl handlers that potentially access the list must grab the lock too. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/iommu/iommufd/device.c | 12 +++ drivers/iommu/iommufd/iommufd_private.h | 21 ++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 121 ++++++++++++++++++++++++ include/uapi/linux/iommufd.h | 40 ++++++++ 5 files changed, 200 insertions(+)