mbox series

[v8,0/5] iommufd: Add iommu hardware info reporting

Message ID 20230816121349.104436-1-yi.l.liu@intel.com
Headers show
Series iommufd: Add iommu hardware info reporting | expand

Message

Yi Liu Aug. 16, 2023, 12:13 p.m. UTC
iommufd gives userspace the capability to manipulate iommu subsytem.
e.g. DMA map/unmap etc. In the near future, it will support iommu nested
translation. Different platform vendors have different implementations for
the nested translation. For example, Intel VT-d supports using guest I/O
page table as the stage-1 translation table. This requires guest I/O page
table be compatible with hardware IOMMU. So before set up nested translation,
userspace needs to know the hardware iommu information to understand the
nested translation requirements.

This series reports the iommu hardware information for a given device
which has been bound to iommufd. It is preparation work for userspace to
allocate hwpt for given device. Like the nested translation support[1].

This series introduces an iommu op to report the iommu hardware info,
and an ioctl IOMMU_GET_HW_INFO is added to report such hardware info to
user. enum iommu_hw_info_type is defined to differentiate the iommu hardware
info reported to user hence user can decode them. This series adds the
framework for iommu hw info reporting, and adds the vtd implementation. The
complete code is available in [1].

[1] https://github.com/yiliu1765/iommufd/tree/iommufd_hw_info-v8

Change log:

v8:
 - Updated the uAPI by allowing a 0 value at the input @data_len
 - Changed to always report the kernel supported data length instead of the
   length that kernel filled in the user space buffer
 - Updated uAPI doc accordingly
 - Add one more selftest for 0 value @data_len and also check the output @data_len
   with the size kernel supports
 - Fix the usage of clear_user()
 - Rebase on top of Jason's for-next branch (base: 65aaca1 iommufd: Remove iommufd_ref_to_users())
 - Include the vtd hw_info implementation from vtd nesting series
   https://lore.kernel.org/r/20230724111335.107427-12-yi.l.liu@intel.com

v7: https://lore.kernel.org/linux-iommu/20230811071501.4126-1-yi.l.liu@intel.com/
 - Use clear_user() (Jason)
 - Add fail_nth for hw_ifo (Jason)

v6: https://lore.kernel.org/linux-iommu/20230808153510.4170-1-yi.l.liu@intel.com/
 - Add Jingqi's comment on patch 02
 - Add Baolu's r-b to patch 03
 - Address Jason's comment on patch 03

v5: https://lore.kernel.org/linux-iommu/20230803143144.200945-1-yi.l.liu@intel.com/
 - Return hw_info_type in the .hw_info op, hence drop hw_info_type field in iommu_ops (Kevin)
 - Add Jason's r-b for patch 01
 - Address coding style comments from Jason and Kevin w.r.t. patch 02, 03 and 04

v4: https://lore.kernel.org/linux-iommu/20230724105936.107042-1-yi.l.liu@intel.com/
 - Rename ioctl to IOMMU_GET_HW_INFO and structure to iommu_hw_info
 - Move the iommufd_get_hw_info handler to main.c
 - Place iommu_hw_info prior to iommu_hwpt_alloc
 - Update the function namings accordingly
 - Update uapi kdocs

v3: https://lore.kernel.org/linux-iommu/20230511143024.19542-1-yi.l.liu@intel.com/#t
 - Add r-b from Baolu
 - Rename IOMMU_HW_INFO_TYPE_DEFAULT to be IOMMU_HW_INFO_TYPE_NONE to
   better suit what it means
 - Let IOMMU_DEVICE_GET_HW_INFO succeed even the underlying iommu driver
   does not have driver-specific data to report per below remark.
   https://lore.kernel.org/kvm/ZAcwJSK%2F9UVI9LXu@nvidia.com/

v2: https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.com/
 - Drop patch 05 of v1 as it is already covered by other series
 - Rename the capability info to be iommu hardware info

v1: https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Lu Baolu (1):
  iommu: Add new iommu op to get iommu hardware information

Nicolin Chen (1):
  iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl

Yi Liu (3):
  iommu: Move dev_iommu_ops() to private header
  iommufd: Add IOMMU_GET_HW_INFO
  iommu/vt-d: Implement hw_info for iommu capability query

 drivers/iommu/intel/iommu.c                   | 19 +++++
 drivers/iommu/iommu-priv.h                    | 11 +++
 drivers/iommu/iommufd/device.c                | 73 ++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h       |  1 +
 drivers/iommu/iommufd/iommufd_test.h          |  9 +++
 drivers/iommu/iommufd/main.c                  |  3 +
 drivers/iommu/iommufd/selftest.c              | 16 ++++
 include/linux/iommu.h                         | 20 +++--
 include/uapi/linux/iommufd.h                  | 74 +++++++++++++++++++
 tools/testing/selftests/iommu/iommufd.c       | 30 +++++++-
 .../selftests/iommu/iommufd_fail_nth.c        |  4 +
 tools/testing/selftests/iommu/iommufd_utils.h | 56 ++++++++++++++
 12 files changed, 304 insertions(+), 12 deletions(-)

Comments

Nicolin Chen Aug. 17, 2023, 9:07 p.m. UTC | #1
Looks like Yi's latest code has not addressed these comments.

On Thu, Aug 17, 2023 at 07:31:42AM +0000, Tian, Kevin wrote:
 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, August 16, 2023 8:14 PM
> >
> > Under nested IOMMU translation, userspace owns the stage-1 translation
> > table (e.g. the stage-1 page table of Intel VT-d or the context table of
> > ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and
> > need to be compatible with the underlying IOMMU hardware. Hence,
> > userspace
> > should know the IOMMU hardware capability before creating and
> > configuring
> > the stage-1 translation table to kernel.
> >
> > This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware
> > information
> > (a.k.a capability) for a given device. The returned data is vendor
> > specific, userspace needs to decode it with the structure by the output
> > @out_data_type field.
> 
> "The format of the returned data is vendor specific and must be decoded
> according to @out_data_type field".

Ack.

> > +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
> > +{
> > +     struct iommu_hw_info *cmd = ucmd->cmd;
> > +     void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr);
> > +     const struct iommu_ops *ops;
> > +     struct iommufd_device *idev;
> > +     unsigned int data_len;
> > +     unsigned int copy_len;
> > +     void *data = NULL;
[..]
> > +     } else {
> > +             cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE;
> > +             data_len = 0;
> > +             data = NULL;
> 
> data is already initialized as NULL.

Will drop.

> > +
> > +     /*
> > +      * We return the length the kernel supports so userspace may know
> > what
> > +      * the kernel capability is. It could be larger than the input buffer.
> > +      */
> > +     cmd->data_len = data_len;
> > +
> > +     rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> > +out:
> 
> out_free:
> 
> > +     kfree(data);
> > +err_put:
> 
> out_put: (since this is also used in the success path)

Ack for both.

> > + * To capture an iommu type specific hardware information data,
> > @data_uptr and
> > + * its length @data_len must be provided. Trailing bytes will be zeroed if the
> > + * user buffer is larger than the data that kernel has. Otherwise, kernel only
> > + * fills the buffer using the given length in @data_len. If the ioctl succeeds,
> > + * @data_len will be updated to the length that kernel actually supports,
> > + * @out_data_type will be filled to decode the data filled in the buffer
> > + * pointed by @data_uptr. Input @data_len == zero is allowed, no
> > information
> > + * data will be filled to user, but user space could get the
> > iommu_hw_info_type
> > + * filled in @out_data_type and the iommu hardware information data
> > length
> > + * supported by kernel filled in @data_len.
> 
> I'd just keep "Input @data_len == zero is allowed" and remove all the
> trailing words which just duplicate with the former context.

Will do.

> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

Adding this.

Thanks
Nic
Nicolin Chen Aug. 17, 2023, 9:55 p.m. UTC | #2
On Thu, Aug 17, 2023 at 07:24:44AM +0000, Tian, Kevin wrote:
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, August 16, 2023 8:14 PM
> >
> > Different IOMMU hardware would have different hardware information. So
> > the
> > information reported differs as well. To let the external user understand
> > the difference. enum iommu_hw_info_type is defined. For the iommu
> 
> s/difference. enum/difference, enum/
> 
> > + * @hw_info: IOMMU hardware information. The type of the returned data
> > is
> > + *           marked by the output type of this op. Type is one of
> > + *           enum iommu_hw_info_type defined in
> > include/uapi/linux/iommufd.h.
> > + *           The drivers that support this op should define a unique type
> > + *           in include/uapi/linux/iommufd.h. The data buffer returned by this
> > + *           op is allocated in the IOMMU driver and the caller should free it
> > + *           after use. Return the data buffer if success, or ERR_PTR on
> > + *           failure.
> 
> simplified as:
> 
>  @hw_info: report iommu hardware information. The data buffer returned by
>            this op is allocated in the iommu driver and freed by the caller
>            after use. The information type is one of enum iommu_hw_info_type
>            defined in include/uapi/linux/iommufd.h.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

Done. Thanks!
Yi Liu Aug. 18, 2023, 12:04 a.m. UTC | #3
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, August 18, 2023 5:08 AM
> 
> Looks like Yi's latest code has not addressed these comments.

Yeah. Not yet. In progress to incorporate them. 😊

> 
> On Thu, Aug 17, 2023 at 07:31:42AM +0000, Tian, Kevin wrote:
> 
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Wednesday, August 16, 2023 8:14 PM
> > >
> > > Under nested IOMMU translation, userspace owns the stage-1 translation
> > > table (e.g. the stage-1 page table of Intel VT-d or the context table of
> > > ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and
> > > need to be compatible with the underlying IOMMU hardware. Hence,
> > > userspace
> > > should know the IOMMU hardware capability before creating and
> > > configuring
> > > the stage-1 translation table to kernel.
> > >
> > > This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware
> > > information
> > > (a.k.a capability) for a given device. The returned data is vendor
> > > specific, userspace needs to decode it with the structure by the output
> > > @out_data_type field.
> >
> > "The format of the returned data is vendor specific and must be decoded
> > according to @out_data_type field".
> 
> Ack.
> 
> > > +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
> > > +{
> > > +     struct iommu_hw_info *cmd = ucmd->cmd;
> > > +     void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr);
> > > +     const struct iommu_ops *ops;
> > > +     struct iommufd_device *idev;
> > > +     unsigned int data_len;
> > > +     unsigned int copy_len;
> > > +     void *data = NULL;
> [..]
> > > +     } else {
> > > +             cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE;
> > > +             data_len = 0;
> > > +             data = NULL;
> >
> > data is already initialized as NULL.

Probably we can set data_len = 0 and the out_data_type to _NONE is
the top as well. Any preference?

> 
> Will drop.
> 
> > > +
> > > +     /*
> > > +      * We return the length the kernel supports so userspace may know
> > > what
> > > +      * the kernel capability is. It could be larger than the input buffer.
> > > +      */
> > > +     cmd->data_len = data_len;
> > > +
> > > +     rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> > > +out:
> >
> > out_free:
> >
> > > +     kfree(data);
> > > +err_put:
> >
> > out_put: (since this is also used in the success path)
> 
> Ack for both.
> 
> > > + * To capture an iommu type specific hardware information data,
> > > @data_uptr and
> > > + * its length @data_len must be provided. Trailing bytes will be zeroed if the
> > > + * user buffer is larger than the data that kernel has. Otherwise, kernel only
> > > + * fills the buffer using the given length in @data_len. If the ioctl succeeds,
> > > + * @data_len will be updated to the length that kernel actually supports,
> > > + * @out_data_type will be filled to decode the data filled in the buffer
> > > + * pointed by @data_uptr. Input @data_len == zero is allowed, no
> > > information
> > > + * data will be filled to user, but user space could get the
> > > iommu_hw_info_type
> > > + * filled in @out_data_type and the iommu hardware information data
> > > length
> > > + * supported by kernel filled in @data_len.
> >
> > I'd just keep "Input @data_len == zero is allowed" and remove all the
> > trailing words which just duplicate with the former context.
> 
> Will do.
> 
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> 
> Adding this.
> 
> Thanks
> Nic
Jason Gunthorpe Aug. 18, 2023, 12:54 a.m. UTC | #4
On Fri, Aug 18, 2023 at 12:04:29AM +0000, Liu, Yi L wrote:
> > > > +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
> > > > +{
> > > > +     struct iommu_hw_info *cmd = ucmd->cmd;
> > > > +     void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr);
> > > > +     const struct iommu_ops *ops;
> > > > +     struct iommufd_device *idev;
> > > > +     unsigned int data_len;
> > > > +     unsigned int copy_len;
> > > > +     void *data = NULL;
> > [..]
> > > > +     } else {
> > > > +             cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE;
> > > > +             data_len = 0;
> > > > +             data = NULL;
> > >
> > > data is already initialized as NULL.
> 
> Probably we can set data_len = 0 and the out_data_type to _NONE is
> the top as well. Any preference?

I think it is clear to remove the variable initialization so this
branch is more explicit

Jason
Nicolin Chen Aug. 18, 2023, 1:30 a.m. UTC | #5
On Thu, Aug 17, 2023 at 05:21:43PM -0700, Nicolin Chen wrote:
> On Thu, Aug 17, 2023 at 05:08:34PM -0700, Nicolin Chen wrote:
> > On Fri, Aug 18, 2023 at 12:04:29AM +0000, Liu, Yi L wrote:
> > 
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Friday, August 18, 2023 5:08 AM
> > > >
> > > > Looks like Yi's latest code has not addressed these comments.
> > > 
> > > Yeah. Not yet. In progress to incorporate them. 😊
> > 
> > I fixed them all in my local tree. I'm finalizing with alloc_user.
> > Let me send my branch after that, and you can edit upon :)
> 
> https://github.com/nicolinc/iommufd/tree/wip/iommufd_alloc_user-08172023-nic
> Attached two sets of git-diff for the updates that I made to the
> two series. Note that I didn't make any change to the VT-d patch.

>  	} else {
>  		cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE;
>  		data_len = 0;
> -		data = NULL;

Reverted this one and removed its initialization in my remote
branch, following Jason's latest email.

Thanks
Nic