diff mbox series

[v1,15/16] iommu/arm-smmu-v3: Add viommu cache invalidation support

Message ID 729dfd0808f85d88fd3ef8bcea0168cc1d2c0d59.1723061378.git.nicolinc@nvidia.com
State New
Headers show
Series iommufd: Add VIOMMU infrastructure (Part-1) | expand

Commit Message

Nicolin Chen Aug. 7, 2024, 8:10 p.m. UTC
Add an arm_smmu_viommu_cache_invalidate() function for user space to issue
cache invalidation commands via viommu.

The viommu invalidation takes the same native format of a 128-bit command,
as the hwpt invalidation. Thus, reuse the same driver data structure, but
make it wider to accept CMDQ_OP_ATC_INV and CMDQ_OP_CFGI_CD{_ALL}.

Scan the commands against the supported ist and fix the VMIDs and SIDs.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 54 +++++++++++++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
 include/uapi/linux/iommufd.h                | 20 ++++++++
 3 files changed, 70 insertions(+), 5 deletions(-)

Comments

Jason Gunthorpe Aug. 15, 2024, 11:36 p.m. UTC | #1
On Wed, Aug 07, 2024 at 01:10:56PM -0700, Nicolin Chen wrote:
> Add an arm_smmu_viommu_cache_invalidate() function for user space to issue
> cache invalidation commands via viommu.
> 
> The viommu invalidation takes the same native format of a 128-bit command,
> as the hwpt invalidation. Thus, reuse the same driver data structure, but
> make it wider to accept CMDQ_OP_ATC_INV and CMDQ_OP_CFGI_CD{_ALL}.
> 
> Scan the commands against the supported ist and fix the VMIDs and SIDs.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 54 +++++++++++++++++++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
>  include/uapi/linux/iommufd.h                | 20 ++++++++
>  3 files changed, 70 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index ec76377d505c..be4f849f1a48 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3219,15 +3219,32 @@ static void arm_smmu_domain_nested_free(struct iommu_domain *domain)
>  	kfree(container_of(domain, struct arm_smmu_nested_domain, domain));
>  }
>  
> +static int arm_smmu_convert_viommu_vdev_id(struct iommufd_viommu *viommu,
> +					   u32 vdev_id, u32 *sid)
> +{
> +	struct arm_smmu_master *master;
> +	struct device *dev;
> +
> +	dev = iommufd_viommu_find_device(viommu, vdev_id);
> +	if (!dev)
> +		return -EIO;
> +	master = dev_iommu_priv_get(dev);
> +
> +	if (sid)
> +		*sid = master->streams[0].id;

See this is the thing that needs to be locked right

xa_lock()
dev = xa_load()
master = dev_iommu_priv_get(dev);
*sid =  master->streams[0].id;
xa_unlock()

Then no risk of dev going away under us.

> @@ -3249,6 +3266,19 @@ arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent,
>  		cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID;
>  		cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vmid);
>  		break;
> +	case CMDQ_OP_ATC_INV:
> +	case CMDQ_OP_CFGI_CD:
> +	case CMDQ_OP_CFGI_CD_ALL:

Oh, I didn't catch on that CD was needing this too.. :\

That makes the other op much more useless than I expected. I really
wanted to break these two series apart.

Maybe we need to drop the hwpt invalidation from the other series and
aim to merge this all together through the iommufd tree.

Jason
Jason Gunthorpe Aug. 19, 2024, 5:36 p.m. UTC | #2
On Thu, Aug 15, 2024 at 05:50:06PM -0700, Nicolin Chen wrote:

> Though only driver would know whether it would eventually access
> the vdev_id list, I'd like to keep things in the way of having a
> core-managed VIOMMU object (IOMMU_VIOMMU_TYPE_DEFAULT), so the
> viommu invalidation handler could have a lock at its top level to
> protect any potential access to the vdev_id list.

It is a bit tortured to keep the xarray hidden. It would be better to
find a way to expose the right struct to the driver.

> > > @@ -3249,6 +3266,19 @@ arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent,
> > >  		cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID;
> > >  		cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vmid);
> > >  		break;
> > > +	case CMDQ_OP_ATC_INV:
> > > +	case CMDQ_OP_CFGI_CD:
> > > +	case CMDQ_OP_CFGI_CD_ALL:
> > 
> > Oh, I didn't catch on that CD was needing this too.. :\
> 
> Well, viommu cache has a very wide range :)
> 
> > That makes the other op much more useless than I expected. I really
> > wanted to break these two series apart.
> 
> HWPT invalidate and VIOMMU invalidate are somewhat duplicated in
> both concept and implementation for SMMUv3. It's not a problem to
> have both but practically I can't think of the reason why VMM not
> simply stick to the wider VIOMMU invalidate uAPI alone..
> 
> > Maybe we need to drop the hwpt invalidation from the other series and
> 
> Yea, the hwpt invalidate is just one patch in your series, it's
> easy to move if we want to.

> > aim to merge this all together through the iommufd tree.
> 
> I have been hoping for that, as you can see those driver patches
> are included here :)

Well, this series has to go through iommufd of course

I was hoping will could take the nesting enablement and we'd do the
viommu next window.

But nesting enablment with out viommu is alot less useful than I had
thought :(

So maybe Will acks the nesting patches and we take the bunch together.

Jason
Nicolin Chen Aug. 19, 2024, 6:19 p.m. UTC | #3
On Mon, Aug 19, 2024 at 02:36:15PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 15, 2024 at 05:50:06PM -0700, Nicolin Chen wrote:
> 
> > Though only driver would know whether it would eventually access
> > the vdev_id list, I'd like to keep things in the way of having a
> > core-managed VIOMMU object (IOMMU_VIOMMU_TYPE_DEFAULT), so the
> > viommu invalidation handler could have a lock at its top level to
> > protect any potential access to the vdev_id list.
> 
> It is a bit tortured to keep the xarray hidden. It would be better to
> find a way to expose the right struct to the driver.

Yes. I will try adding set/unset_vdev_id to the default viommu
ops.

> > > > @@ -3249,6 +3266,19 @@ arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent,
> > > >  		cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID;
> > > >  		cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vmid);
> > > >  		break;
> > > > +	case CMDQ_OP_ATC_INV:
> > > > +	case CMDQ_OP_CFGI_CD:
> > > > +	case CMDQ_OP_CFGI_CD_ALL:
> > > 
> > > Oh, I didn't catch on that CD was needing this too.. :\
> > 
> > Well, viommu cache has a very wide range :)
> > 
> > > That makes the other op much more useless than I expected. I really
> > > wanted to break these two series apart.
> > 
> > HWPT invalidate and VIOMMU invalidate are somewhat duplicated in
> > both concept and implementation for SMMUv3. It's not a problem to
> > have both but practically I can't think of the reason why VMM not
> > simply stick to the wider VIOMMU invalidate uAPI alone..
> > 
> > > Maybe we need to drop the hwpt invalidation from the other series and
> > 
> > Yea, the hwpt invalidate is just one patch in your series, it's
> > easy to move if we want to.
> 
> > > aim to merge this all together through the iommufd tree.
> > 
> > I have been hoping for that, as you can see those driver patches
> > are included here :)
> 
> Well, this series has to go through iommufd of course
> 
> I was hoping will could take the nesting enablement and we'd do the
> viommu next window.
> 
> But nesting enablment with out viommu is alot less useful than I had
> thought :(

Actually, without viommu, the hwpt cache invalidate alone could
still support non-SVA case?

Though we still have the blocker at the msi mapping... It still
requires a solution, even for viommu series.

Thanks
Nicolin
Jason Gunthorpe Aug. 19, 2024, 6:28 p.m. UTC | #4
On Mon, Aug 19, 2024 at 11:19:39AM -0700, Nicolin Chen wrote:

> > But nesting enablment with out viommu is alot less useful than I had
> > thought :(
> 
> Actually, without viommu, the hwpt cache invalidate alone could
> still support non-SVA case?

That is what I thought, but doesn't the guest still have to invalidate
the CD table entry # 0?

> Though we still have the blocker at the msi mapping... It still
> requires a solution, even for viommu series.

Yes, small steps. The point of this step was to get the nested paging
only (without msi, pri, etc, etc)

Jason
Nicolin Chen Aug. 19, 2024, 6:38 p.m. UTC | #5
On Mon, Aug 19, 2024 at 03:28:11PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 19, 2024 at 11:19:39AM -0700, Nicolin Chen wrote:
> 
> > > But nesting enablment with out viommu is alot less useful than I had
> > > thought :(
> > 
> > Actually, without viommu, the hwpt cache invalidate alone could
> > still support non-SVA case?
> 
> That is what I thought, but doesn't the guest still have to invalidate
> the CD table entry # 0?

I recall it doesn't. The CD cache invalidation is required in the
viommu invalidation for an SVA case where we need a PASID number
to specify CD to the substream. But the CD to the default stream
is only changed during a vSTE setup, and the host knows the PASID
number (=0)?

> > Though we still have the blocker at the msi mapping... It still
> > requires a solution, even for viommu series.
> 
> Yes, small steps. The point of this step was to get the nested paging
> only (without msi, pri, etc, etc)

Ack.

Thanks
Nicolin
Jason Gunthorpe Aug. 19, 2024, 6:47 p.m. UTC | #6
On Mon, Aug 19, 2024 at 11:38:22AM -0700, Nicolin Chen wrote:
> On Mon, Aug 19, 2024 at 03:28:11PM -0300, Jason Gunthorpe wrote:
> > On Mon, Aug 19, 2024 at 11:19:39AM -0700, Nicolin Chen wrote:
> > 
> > > > But nesting enablment with out viommu is alot less useful than I had
> > > > thought :(
> > > 
> > > Actually, without viommu, the hwpt cache invalidate alone could
> > > still support non-SVA case?
> > 
> > That is what I thought, but doesn't the guest still have to invalidate
> > the CD table entry # 0?
> 
> I recall it doesn't. The CD cache invalidation is required in the
> viommu invalidation for an SVA case where we need a PASID number
> to specify CD to the substream. But the CD to the default stream
> is only changed during a vSTE setup, and the host knows the PASID
> number (=0)?

I think that would subtly assume certain things about how the driver
does ordering, ie the that CD table entry 0 is setup with the S1
before we load it into the STE.

Yes, the Linux driver does that now, but I don't think anyone should
rely on that..

Jason
Nicolin Chen Aug. 19, 2024, 6:54 p.m. UTC | #7
On Mon, Aug 19, 2024 at 03:47:16PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 19, 2024 at 11:38:22AM -0700, Nicolin Chen wrote:
> > On Mon, Aug 19, 2024 at 03:28:11PM -0300, Jason Gunthorpe wrote:
> > > On Mon, Aug 19, 2024 at 11:19:39AM -0700, Nicolin Chen wrote:
> > > 
> > > > > But nesting enablment with out viommu is alot less useful than I had
> > > > > thought :(
> > > > 
> > > > Actually, without viommu, the hwpt cache invalidate alone could
> > > > still support non-SVA case?
> > > 
> > > That is what I thought, but doesn't the guest still have to invalidate
> > > the CD table entry # 0?
> > 
> > I recall it doesn't. The CD cache invalidation is required in the
> > viommu invalidation for an SVA case where we need a PASID number
> > to specify CD to the substream. But the CD to the default stream
> > is only changed during a vSTE setup, and the host knows the PASID
> > number (=0)?
> 
> I think that would subtly assume certain things about how the driver
> does ordering, ie the that CD table entry 0 is setup with the S1
> before we load it into the STE.
> 
> Yes, the Linux driver does that now, but I don't think anyone should
> rely on that..

Oh that's true...

Thanks
Nicolin
diff mbox series

Patch

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ec76377d505c..be4f849f1a48 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3219,15 +3219,32 @@  static void arm_smmu_domain_nested_free(struct iommu_domain *domain)
 	kfree(container_of(domain, struct arm_smmu_nested_domain, domain));
 }
 
+static int arm_smmu_convert_viommu_vdev_id(struct iommufd_viommu *viommu,
+					   u32 vdev_id, u32 *sid)
+{
+	struct arm_smmu_master *master;
+	struct device *dev;
+
+	dev = iommufd_viommu_find_device(viommu, vdev_id);
+	if (!dev)
+		return -EIO;
+	master = dev_iommu_priv_get(dev);
+
+	if (sid)
+		*sid = master->streams[0].id;
+	return 0;
+}
+
 /*
  * Convert, in place, the raw invalidation command into an internal format that
  * can be passed to arm_smmu_cmdq_issue_cmdlist(). Internally commands are
  * stored in CPU endian.
  *
- * Enforce the VMID on the command.
+ * Enforce the VMID or the SID on the command.
  */
 static int
 arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent,
+			  struct iommufd_viommu *viommu,
 			  struct iommu_hwpt_arm_smmuv3_invalidate *cmd)
 {
 	u16 vmid = s2_parent->s2_cfg.vmid;
@@ -3249,6 +3266,19 @@  arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent,
 		cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID;
 		cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vmid);
 		break;
+	case CMDQ_OP_ATC_INV:
+	case CMDQ_OP_CFGI_CD:
+	case CMDQ_OP_CFGI_CD_ALL:
+		if (viommu) {
+			u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]);
+
+			if (arm_smmu_convert_viommu_vdev_id(viommu, vsid, &sid))
+				return -EIO;
+			cmd->cmd[0] &= ~CMDQ_CFGI_0_SID;
+			cmd->cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, sid);
+			break;
+		}
+		fallthrough;
 	default:
 		return -EIO;
 	}
@@ -3256,8 +3286,11 @@  arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent,
 }
 
 static int __arm_smmu_cache_invalidate_user(struct arm_smmu_domain *s2_parent,
+					    struct iommufd_viommu *viommu,
 					    struct iommu_user_data_array *array)
 {
+	unsigned int type = viommu ? IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3 :
+				     IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3;
 	struct arm_smmu_device *smmu = s2_parent->smmu;
 	struct iommu_hwpt_arm_smmuv3_invalidate *last_batch;
 	struct iommu_hwpt_arm_smmuv3_invalidate *cmds;
@@ -3273,14 +3306,13 @@  static int __arm_smmu_cache_invalidate_user(struct arm_smmu_domain *s2_parent,
 
 	static_assert(sizeof(*cmds) == 2 * sizeof(u64));
 	ret = iommu_copy_struct_from_full_user_array(
-		cmds, sizeof(*cmds), array,
-		IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3);
+		cmds, sizeof(*cmds), array, type);
 	if (ret)
 		goto out;
 
 	last_batch = cmds;
 	while (cur != end) {
-		ret = arm_smmu_convert_user_cmd(s2_parent, cur);
+		ret = arm_smmu_convert_user_cmd(s2_parent, viommu, cur);
 		if (ret)
 			goto out;
 
@@ -3310,7 +3342,7 @@  static int arm_smmu_cache_invalidate_user(struct iommu_domain *domain,
 		container_of(domain, struct arm_smmu_nested_domain, domain);
 
 	return __arm_smmu_cache_invalidate_user(
-			nested_domain->s2_parent, array);
+			nested_domain->s2_parent, NULL, array);
 }
 
 static struct iommu_domain *
@@ -3812,6 +3844,15 @@  static int arm_smmu_def_domain_type(struct device *dev)
 	return 0;
 }
 
+static int arm_smmu_viommu_cache_invalidate(struct iommufd_viommu *viommu,
+					    struct iommu_user_data_array *array)
+{
+	struct iommu_domain *domain = iommufd_viommu_to_parent_domain(viommu);
+
+	return __arm_smmu_cache_invalidate_user(
+			to_smmu_domain(domain), viommu, array);
+}
+
 static struct iommu_ops arm_smmu_ops = {
 	.identity_domain	= &arm_smmu_identity_domain,
 	.blocked_domain		= &arm_smmu_blocked_domain,
@@ -3842,6 +3883,9 @@  static struct iommu_ops arm_smmu_ops = {
 		.iotlb_sync		= arm_smmu_iotlb_sync,
 		.iova_to_phys		= arm_smmu_iova_to_phys,
 		.free			= arm_smmu_domain_free_paging,
+		.default_viommu_ops = &(const struct iommufd_viommu_ops) {
+			.cache_invalidate = arm_smmu_viommu_cache_invalidate,
+		}
 	}
 };
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 3f7442f0167e..a3fb08e0a195 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -10,6 +10,7 @@ 
 
 #include <linux/bitfield.h>
 #include <linux/iommu.h>
+#include <linux/iommufd.h>
 #include <linux/kernel.h>
 #include <linux/mmzone.h>
 #include <linux/sizes.h>
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 998b3f2cd2b5..416b9a18e6bb 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -956,6 +956,26 @@  enum iommu_viommu_invalidate_data_type {
 	IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
 };
 
+/**
+ * struct iommu_viommu_arm_smmuv3_invalidate - ARM SMMUv3 cahce invalidation
+ *         (IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3)
+ * @cmd: 128-bit cache invalidation command that runs in SMMU CMDQ.
+ *       Must be little-endian.
+ *
+ * Supported command list:
+ *     CMDQ_OP_TLBI_NSNH_ALL
+ *     CMDQ_OP_TLBI_NH_VA
+ *     CMDQ_OP_TLBI_NH_VAA
+ *     CMDQ_OP_TLBI_NH_ALL
+ *     CMDQ_OP_TLBI_NH_ASID
+ *     CMDQ_OP_ATC_INV
+ *     CMDQ_OP_CFGI_CD
+ *     CMDQ_OP_CFGI_CD_ALL
+ *
+ * -EIO will be returned if the command is not supported.
+ */
+#define iommu_viommu_arm_smmuv3_invalidate iommu_hwpt_arm_smmuv3_invalidate
+
 /**
  * struct iommu_viommu_invalidate - ioctl(IOMMU_VIOMMU_INVALIDATE)
  * @size: sizeof(struct iommu_viommu_invalidate)