diff mbox series

iommu: Avoid more races around device probe

Message ID 16f433658661d7cadfea51e7c65da95826112a2b.1700071477.git.robin.murphy@arm.com
State Accepted
Commit a2e7e59a94269484a83386972ca07c22fd188854
Headers show
Series iommu: Avoid more races around device probe | expand

Commit Message

Robin Murphy Nov. 15, 2023, 6:25 p.m. UTC
It turns out there are more subtle races beyond just the main part of
__iommu_probe_device() itself running in parallel - the dev_iommu_free()
on the way out of an unsuccessful probe can still manage to trip up
concurrent accesses to a device's fwspec. Thus, extend the scope of
iommu_probe_device_lock() to also serialise fwspec creation and initial
retrieval.

Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Link: https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@quicinc.com/
Fixes: 01657bc14a39 ("iommu: Avoid races around device probe")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

This is my idea of a viable fix, since it does not need a 700-line
diffstat to make the code do what it was already *trying* to do anyway.
This stuff should fundamentally not be hanging off driver probe in the
first place, so I'd rather get on with removing the underlying
brokenness than waste time and effort polishing it any further.

 drivers/acpi/scan.c      |  7 ++++++-
 drivers/iommu/iommu.c    | 20 ++++++++++----------
 drivers/iommu/of_iommu.c | 12 +++++++++---
 include/linux/iommu.h    |  1 +
 4 files changed, 26 insertions(+), 14 deletions(-)

Comments

Jason Gunthorpe Nov. 16, 2023, 5:05 p.m. UTC | #1
On Wed, Nov 15, 2023 at 06:25:44PM +0000, Robin Murphy wrote:
> It turns out there are more subtle races beyond just the main part of
> __iommu_probe_device() itself running in parallel - the dev_iommu_free()
> on the way out of an unsuccessful probe can still manage to trip up
> concurrent accesses to a device's fwspec. Thus, extend the scope of
> iommu_probe_device_lock() to also serialise fwspec creation and initial
> retrieval.
> 
> Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
> Link: https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@quicinc.com/
> Fixes: 01657bc14a39 ("iommu: Avoid races around device probe")
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> This is my idea of a viable fix, since it does not need a 700-line
> diffstat to make the code do what it was already *trying* to do anyway.
> This stuff should fundamentally not be hanging off driver probe in the
> first place, so I'd rather get on with removing the underlying
> brokenness than waste time and effort polishing it any further.

I'm fine with this as some hacky backport, but I don't want to see
this cross-layer leakage left in the next merge window.

ie we should still do my other series on top of and reverting this.

I've poked at moving parts of it under probe and I think we can do
substantial amounts in about two more series and a tidy a bunch of
other stuff too.

Jason
André Draszik Nov. 20, 2023, 11:26 a.m. UTC | #2
Hi Robin,

On Wed, 2023-11-15 at 18:25 +0000, Robin Murphy wrote:
> It turns out there are more subtle races beyond just the main part of
> __iommu_probe_device() itself running in parallel - the
> dev_iommu_free()
> on the way out of an unsuccessful probe can still manage to trip up
> concurrent accesses to a device's fwspec. Thus, extend the scope of
> iommu_probe_device_lock() to also serialise fwspec creation and
> initial
> retrieval.
> 
> Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
> Link:
> https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@quicinc.com/
> Fixes: 01657bc14a39 ("iommu: Avoid races around device probe")
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

Reviewed-by: André Draszik <andre.draszik@linaro.org>
Tested-by: André Draszik <andre.draszik@linaro.org> (using continuous
boot test loop)

I like that this is easily back-portable to 6.1, thanks for this patch
:-)

Cheers,
André


> ---
> 
> This is my idea of a viable fix, since it does not need a 700-line
> diffstat to make the code do what it was already *trying* to do
> anyway.
> This stuff should fundamentally not be hanging off driver probe in
> the
> first place, so I'd rather get on with removing the underlying
> brokenness than waste time and effort polishing it any further.
> 
>  drivers/acpi/scan.c      |  7 ++++++-
>  drivers/iommu/iommu.c    | 20 ++++++++++----------
>  drivers/iommu/of_iommu.c | 12 +++++++++---
>  include/linux/iommu.h    |  1 +
>  4 files changed, 26 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index fa5dd71a80fa..02bb2cce423f 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1568,17 +1568,22 @@ static const struct iommu_ops
> *acpi_iommu_configure_id(struct device *dev,
>  	int err;
>  	const struct iommu_ops *ops;
>  
> +	/* Serialise to make dev->iommu stable under our potential
> fwspec */
> +	mutex_lock(&iommu_probe_device_lock);
>  	/*
>  	 * If we already translated the fwspec there is nothing left
> to do,
>  	 * return the iommu_ops.
>  	 */
>  	ops = acpi_iommu_fwspec_ops(dev);
> -	if (ops)
> +	if (ops) {
> +		mutex_unlock(&iommu_probe_device_lock);
>  		return ops;
> +	}
>  
>  	err = iort_iommu_configure_id(dev, id_in);
>  	if (err && err != -EPROBE_DEFER)
>  		err = viot_iommu_configure(dev);
> +	mutex_unlock(&iommu_probe_device_lock);
>  
>  	/*
>  	 * If we have reason to believe the IOMMU driver missed the
> initial
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f17a1113f3d6..e0c962648dde 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -485,11 +485,12 @@ static void iommu_deinit_device(struct device
> *dev)
>  	dev_iommu_free(dev);
>  }
>  
> +DEFINE_MUTEX(iommu_probe_device_lock);
> +
>  static int __iommu_probe_device(struct device *dev, struct list_head
> *group_list)
>  {
>  	const struct iommu_ops *ops = dev->bus->iommu_ops;
>  	struct iommu_group *group;
> -	static DEFINE_MUTEX(iommu_probe_device_lock);
>  	struct group_device *gdev;
>  	int ret;
>  
> @@ -502,17 +503,15 @@ static int __iommu_probe_device(struct device
> *dev, struct list_head *group_list
>  	 * probably be able to use device_lock() here to minimise
> the scope,
>  	 * but for now enforcing a simple global ordering is fine.
>  	 */
> -	mutex_lock(&iommu_probe_device_lock);
> +	lockdep_assert_held(&iommu_probe_device_lock);
>  
>  	/* Device is probed already if in a group */
> -	if (dev->iommu_group) {
> -		ret = 0;
> -		goto out_unlock;
> -	}
> +	if (dev->iommu_group)
> +		return 0;
>  
>  	ret = iommu_init_device(dev, ops);
>  	if (ret)
> -		goto out_unlock;
> +		return ret;
>  
>  	group = dev->iommu_group;
>  	gdev = iommu_group_alloc_device(group, dev);
> @@ -548,7 +547,6 @@ static int __iommu_probe_device(struct device
> *dev, struct list_head *group_list
>  			list_add_tail(&group->entry, group_list);
>  	}
>  	mutex_unlock(&group->mutex);
> -	mutex_unlock(&iommu_probe_device_lock);
>  
>  	if (dev_is_pci(dev))
>  		iommu_dma_set_pci_32bit_workaround(dev);
> @@ -562,8 +560,6 @@ static int __iommu_probe_device(struct device
> *dev, struct list_head *group_list
>  	iommu_deinit_device(dev);
>  	mutex_unlock(&group->mutex);
>  	iommu_group_put(group);
> -out_unlock:
> -	mutex_unlock(&iommu_probe_device_lock);
>  
>  	return ret;
>  }
> @@ -573,7 +569,9 @@ int iommu_probe_device(struct device *dev)
>  	const struct iommu_ops *ops;
>  	int ret;
>  
> +	mutex_lock(&iommu_probe_device_lock);
>  	ret = __iommu_probe_device(dev, NULL);
> +	mutex_unlock(&iommu_probe_device_lock);
>  	if (ret)
>  		return ret;
>  
> @@ -1822,7 +1820,9 @@ static int probe_iommu_group(struct device
> *dev, void *data)
>  	struct list_head *group_list = data;
>  	int ret;
>  
> +	mutex_lock(&iommu_probe_device_lock);
>  	ret = __iommu_probe_device(dev, group_list);
> +	mutex_unlock(&iommu_probe_device_lock);
>  	if (ret == -ENODEV)
>  		ret = 0;
>  
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 157b286e36bf..c25b4ae6aeee 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -112,16 +112,20 @@ const struct iommu_ops
> *of_iommu_configure(struct device *dev,
>  					   const u32 *id)
>  {
>  	const struct iommu_ops *ops = NULL;
> -	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +	struct iommu_fwspec *fwspec;
>  	int err = NO_IOMMU;
>  
>  	if (!master_np)
>  		return NULL;
>  
> +	/* Serialise to make dev->iommu stable under our potential
> fwspec */
> +	mutex_lock(&iommu_probe_device_lock);
> +	fwspec = dev_iommu_fwspec_get(dev);
>  	if (fwspec) {
> -		if (fwspec->ops)
> +		if (fwspec->ops) {
> +			mutex_unlock(&iommu_probe_device_lock);
>  			return fwspec->ops;
> -
> +		}
>  		/* In the deferred case, start again from scratch */
>  		iommu_fwspec_free(dev);
>  	}
> @@ -155,6 +159,8 @@ const struct iommu_ops *of_iommu_configure(struct
> device *dev,
>  		fwspec = dev_iommu_fwspec_get(dev);
>  		ops    = fwspec->ops;
>  	}
> +	mutex_unlock(&iommu_probe_device_lock);
> +
>  	/*
>  	 * If we have reason to believe the IOMMU driver missed the
> initial
>  	 * probe for dev, replay it to get things in order.
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index ec289c1016f5..6291aa7b079b 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -845,6 +845,7 @@ static inline void dev_iommu_priv_set(struct
> device *dev, void *priv)
>  	dev->iommu->priv = priv;
>  }
>  
> +extern struct mutex iommu_probe_device_lock;
>  int iommu_probe_device(struct device *dev);
>  
>  int iommu_dev_enable_feature(struct device *dev, enum
> iommu_dev_features f);
Greg KH Nov. 22, 2023, 2:44 p.m. UTC | #3
On Thu, Nov 16, 2023 at 01:05:58PM -0400, Jason Gunthorpe wrote:
> On Wed, Nov 15, 2023 at 06:25:44PM +0000, Robin Murphy wrote:
> > It turns out there are more subtle races beyond just the main part of
> > __iommu_probe_device() itself running in parallel - the dev_iommu_free()
> > on the way out of an unsuccessful probe can still manage to trip up
> > concurrent accesses to a device's fwspec. Thus, extend the scope of
> > iommu_probe_device_lock() to also serialise fwspec creation and initial
> > retrieval.
> > 
> > Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
> > Link: https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@quicinc.com/
> > Fixes: 01657bc14a39 ("iommu: Avoid races around device probe")
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> > 
> > This is my idea of a viable fix, since it does not need a 700-line
> > diffstat to make the code do what it was already *trying* to do anyway.
> > This stuff should fundamentally not be hanging off driver probe in the
> > first place, so I'd rather get on with removing the underlying
> > brokenness than waste time and effort polishing it any further.
> 
> I'm fine with this as some hacky backport, but I don't want to see
> this cross-layer leakage left in the next merge window.
> 
> ie we should still do my other series on top of and reverting this.
> 
> I've poked at moving parts of it under probe and I think we can do
> substantial amounts in about two more series and a tidy a bunch of
> other stuff too.

I agree, it's messy and acpi should not need this, BUT at the moment, I
can't see any other way to resolve this simply.

So here's a begrudged ack:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

and hopefully the larger series should resolve this correctly?  Can that
be rebased on top of this?

Also, cc: stable on this for whomever applies it?

thanks,

greg k-h
Jason Gunthorpe Nov. 22, 2023, 7:22 p.m. UTC | #4
On Wed, Nov 22, 2023 at 02:44:53PM +0000, Greg KH wrote:
> On Thu, Nov 16, 2023 at 01:05:58PM -0400, Jason Gunthorpe wrote:
> > On Wed, Nov 15, 2023 at 06:25:44PM +0000, Robin Murphy wrote:
> > > It turns out there are more subtle races beyond just the main part of
> > > __iommu_probe_device() itself running in parallel - the dev_iommu_free()
> > > on the way out of an unsuccessful probe can still manage to trip up
> > > concurrent accesses to a device's fwspec. Thus, extend the scope of
> > > iommu_probe_device_lock() to also serialise fwspec creation and initial
> > > retrieval.
> > > 
> > > Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
> > > Link: https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@quicinc.com/
> > > Fixes: 01657bc14a39 ("iommu: Avoid races around device probe")
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > > 
> > > This is my idea of a viable fix, since it does not need a 700-line
> > > diffstat to make the code do what it was already *trying* to do anyway.
> > > This stuff should fundamentally not be hanging off driver probe in the
> > > first place, so I'd rather get on with removing the underlying
> > > brokenness than waste time and effort polishing it any further.
> > 
> > I'm fine with this as some hacky backport, but I don't want to see
> > this cross-layer leakage left in the next merge window.
> > 
> > ie we should still do my other series on top of and reverting this.
> > 
> > I've poked at moving parts of it under probe and I think we can do
> > substantial amounts in about two more series and a tidy a bunch of
> > other stuff too.
> 
> I agree, it's messy and acpi should not need this, BUT at the moment, I
> can't see any other way to resolve this simply.
> 
> So here's a begrudged ack:
> 
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> and hopefully the larger series should resolve this correctly?  Can that
> be rebased on top of this?

Yeah, I'm working on something more along the lines of Robin's desire
for a full reorganization.

The existing series has been tested by a few people now. We can decide
which order to put things in maybe next week if I get the new approach
done..

> Also, cc: stable on this for whomever applies it?

Also please update the commit message, the text from here does
describe the race:

https://lore.kernel.org/linux-iommu/11-v2-36a0088ecaa7+22c6e-iommu_fwspec_jgg@nvidia.com/

Jason
Jason Gunthorpe Nov. 24, 2023, 3:35 p.m. UTC | #5
On Wed, Nov 15, 2023 at 06:25:44PM +0000, Robin Murphy wrote:
> It turns out there are more subtle races beyond just the main part of
> __iommu_probe_device() itself running in parallel - the dev_iommu_free()
> on the way out of an unsuccessful probe can still manage to trip up
> concurrent accesses to a device's fwspec. Thus, extend the scope of
> iommu_probe_device_lock() to also serialise fwspec creation and initial
> retrieval.
> 
> Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
> Link: https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@quicinc.com/

Apparently this is "Closes:" nowadays

> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f17a1113f3d6..e0c962648dde 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -485,11 +485,12 @@ static void iommu_deinit_device(struct device *dev)
>  	dev_iommu_free(dev);
>  }
>  
> +DEFINE_MUTEX(iommu_probe_device_lock);
> +

Please put this at the top of the file with the other declarations, I
will rebase the lockdep annotations from the other series

Thanks,
Jason
Joerg Roedel Nov. 27, 2023, 9:57 a.m. UTC | #6
On Thu, Nov 16, 2023 at 01:05:58PM -0400, Jason Gunthorpe wrote:
> I'm fine with this as some hacky backport, but I don't want to see
> this cross-layer leakage left in the next merge window.
> 
> ie we should still do my other series on top of and reverting this.
> 
> I've poked at moving parts of it under probe and I think we can do
> substantial amounts in about two more series and a tidy a bunch of
> other stuff too.

Applied this for v6.7. Before I am going to apply any series on-top I
want to see you and Robin agree on a general direction for this moving
forward.

Regards,

	Joerg
diff mbox series

Patch

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index fa5dd71a80fa..02bb2cce423f 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1568,17 +1568,22 @@  static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
 	int err;
 	const struct iommu_ops *ops;
 
+	/* Serialise to make dev->iommu stable under our potential fwspec */
+	mutex_lock(&iommu_probe_device_lock);
 	/*
 	 * If we already translated the fwspec there is nothing left to do,
 	 * return the iommu_ops.
 	 */
 	ops = acpi_iommu_fwspec_ops(dev);
-	if (ops)
+	if (ops) {
+		mutex_unlock(&iommu_probe_device_lock);
 		return ops;
+	}
 
 	err = iort_iommu_configure_id(dev, id_in);
 	if (err && err != -EPROBE_DEFER)
 		err = viot_iommu_configure(dev);
+	mutex_unlock(&iommu_probe_device_lock);
 
 	/*
 	 * If we have reason to believe the IOMMU driver missed the initial
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f17a1113f3d6..e0c962648dde 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -485,11 +485,12 @@  static void iommu_deinit_device(struct device *dev)
 	dev_iommu_free(dev);
 }
 
+DEFINE_MUTEX(iommu_probe_device_lock);
+
 static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
 {
 	const struct iommu_ops *ops = dev->bus->iommu_ops;
 	struct iommu_group *group;
-	static DEFINE_MUTEX(iommu_probe_device_lock);
 	struct group_device *gdev;
 	int ret;
 
@@ -502,17 +503,15 @@  static int __iommu_probe_device(struct device *dev, struct list_head *group_list
 	 * probably be able to use device_lock() here to minimise the scope,
 	 * but for now enforcing a simple global ordering is fine.
 	 */
-	mutex_lock(&iommu_probe_device_lock);
+	lockdep_assert_held(&iommu_probe_device_lock);
 
 	/* Device is probed already if in a group */
-	if (dev->iommu_group) {
-		ret = 0;
-		goto out_unlock;
-	}
+	if (dev->iommu_group)
+		return 0;
 
 	ret = iommu_init_device(dev, ops);
 	if (ret)
-		goto out_unlock;
+		return ret;
 
 	group = dev->iommu_group;
 	gdev = iommu_group_alloc_device(group, dev);
@@ -548,7 +547,6 @@  static int __iommu_probe_device(struct device *dev, struct list_head *group_list
 			list_add_tail(&group->entry, group_list);
 	}
 	mutex_unlock(&group->mutex);
-	mutex_unlock(&iommu_probe_device_lock);
 
 	if (dev_is_pci(dev))
 		iommu_dma_set_pci_32bit_workaround(dev);
@@ -562,8 +560,6 @@  static int __iommu_probe_device(struct device *dev, struct list_head *group_list
 	iommu_deinit_device(dev);
 	mutex_unlock(&group->mutex);
 	iommu_group_put(group);
-out_unlock:
-	mutex_unlock(&iommu_probe_device_lock);
 
 	return ret;
 }
@@ -573,7 +569,9 @@  int iommu_probe_device(struct device *dev)
 	const struct iommu_ops *ops;
 	int ret;
 
+	mutex_lock(&iommu_probe_device_lock);
 	ret = __iommu_probe_device(dev, NULL);
+	mutex_unlock(&iommu_probe_device_lock);
 	if (ret)
 		return ret;
 
@@ -1822,7 +1820,9 @@  static int probe_iommu_group(struct device *dev, void *data)
 	struct list_head *group_list = data;
 	int ret;
 
+	mutex_lock(&iommu_probe_device_lock);
 	ret = __iommu_probe_device(dev, group_list);
+	mutex_unlock(&iommu_probe_device_lock);
 	if (ret == -ENODEV)
 		ret = 0;
 
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 157b286e36bf..c25b4ae6aeee 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -112,16 +112,20 @@  const struct iommu_ops *of_iommu_configure(struct device *dev,
 					   const u32 *id)
 {
 	const struct iommu_ops *ops = NULL;
-	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	struct iommu_fwspec *fwspec;
 	int err = NO_IOMMU;
 
 	if (!master_np)
 		return NULL;
 
+	/* Serialise to make dev->iommu stable under our potential fwspec */
+	mutex_lock(&iommu_probe_device_lock);
+	fwspec = dev_iommu_fwspec_get(dev);
 	if (fwspec) {
-		if (fwspec->ops)
+		if (fwspec->ops) {
+			mutex_unlock(&iommu_probe_device_lock);
 			return fwspec->ops;
-
+		}
 		/* In the deferred case, start again from scratch */
 		iommu_fwspec_free(dev);
 	}
@@ -155,6 +159,8 @@  const struct iommu_ops *of_iommu_configure(struct device *dev,
 		fwspec = dev_iommu_fwspec_get(dev);
 		ops    = fwspec->ops;
 	}
+	mutex_unlock(&iommu_probe_device_lock);
+
 	/*
 	 * If we have reason to believe the IOMMU driver missed the initial
 	 * probe for dev, replay it to get things in order.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ec289c1016f5..6291aa7b079b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -845,6 +845,7 @@  static inline void dev_iommu_priv_set(struct device *dev, void *priv)
 	dev->iommu->priv = priv;
 }
 
+extern struct mutex iommu_probe_device_lock;
 int iommu_probe_device(struct device *dev);
 
 int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);