diff mbox series

[v3,2/2] tpm: in tpm2_del_space check if ops pointer is still valid

Message ID 1612482643-11796-3-git-send-email-LinoSanfilippo@gmx.de
State New
Headers show
Series TPM fixes | expand

Commit Message

Lino Sanfilippo Feb. 4, 2021, 11:50 p.m. UTC
From: Lino Sanfilippo <l.sanfilippo@kunbus.com>

In tpm2_del_space() chip->ops is used for flushing the sessions. However
this function may be called after tpm_chip_unregister() which sets
the chip->ops pointer to NULL.
Avoid a possible NULL pointer dereference by checking if chip->ops is still
valid before accessing it.

Fixes: a3fbfae82b4c ("tpm: take TPM chip power gating out of tpm_transmit()")
Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
---
 drivers/char/tpm/tpm2-space.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

Comments

James Bottomley Feb. 5, 2021, 12:34 a.m. UTC | #1
On Fri, 2021-02-05 at 00:50 +0100, Lino Sanfilippo wrote:
> From: Lino Sanfilippo <l.sanfilippo@kunbus.com>

> 

> In tpm2_del_space() chip->ops is used for flushing the sessions.

> However

> this function may be called after tpm_chip_unregister() which sets

> the chip->ops pointer to NULL.

> Avoid a possible NULL pointer dereference by checking if chip->ops is

> still

> valid before accessing it.

> 

> Fixes: a3fbfae82b4c ("tpm: take TPM chip power gating out of

> tpm_transmit()")

> Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>

> ---

>  drivers/char/tpm/tpm2-space.c | 15 ++++++++++-----

>  1 file changed, 10 insertions(+), 5 deletions(-)

> 

> diff --git a/drivers/char/tpm/tpm2-space.c b/drivers/char/tpm/tpm2-

> space.c

> index 784b8b3..9a29a40 100644

> --- a/drivers/char/tpm/tpm2-space.c

> +++ b/drivers/char/tpm/tpm2-space.c

> @@ -58,12 +58,17 @@ int tpm2_init_space(struct tpm_space *space,

> unsigned int buf_size)

>  

>  void tpm2_del_space(struct tpm_chip *chip, struct tpm_space *space)

>  {

> -	mutex_lock(&chip->tpm_mutex);

> -	if (!tpm_chip_start(chip)) {

> -		tpm2_flush_sessions(chip, space);

> -		tpm_chip_stop(chip);

> +	down_read(&chip->ops_sem);

> +	if (chip->ops) {

> +		mutex_lock(&chip->tpm_mutex);

> +		if (!tpm_chip_start(chip)) {

> +			tpm2_flush_sessions(chip, space);

> +			tpm_chip_stop(chip);

> +		}

> +		mutex_unlock(&chip->tpm_mutex);

>  	}

> -	mutex_unlock(&chip->tpm_mutex);

> +	up_read(&chip->ops_sem);

> +

>  	kfree(space->context_buf);

>  	kfree(space->session_buf);

>  }



Actually, this still isn't right.  As I said to the last person who
reported this, we should be doing a get/put on the ops, not rolling our
own here:

https://lore.kernel.org/linux-integrity/e7566e1e48f5be9dca034b4bfb67683b5d3cb88f.camel@HansenPartnership.com/

The reporter went silent before we could get this tested, but could you
try, please, because your patch is still hand rolling the ops get/put,
just slightly better than it had been done previously.

James
Jarkko Sakkinen Feb. 5, 2021, 2:18 a.m. UTC | #2
On Thu, Feb 04, 2021 at 04:34:11PM -0800, James Bottomley wrote:
> On Fri, 2021-02-05 at 00:50 +0100, Lino Sanfilippo wrote:

> > From: Lino Sanfilippo <l.sanfilippo@kunbus.com>

> > 

> > In tpm2_del_space() chip->ops is used for flushing the sessions.

> > However

> > this function may be called after tpm_chip_unregister() which sets

> > the chip->ops pointer to NULL.

> > Avoid a possible NULL pointer dereference by checking if chip->ops is

> > still

> > valid before accessing it.

> > 

> > Fixes: a3fbfae82b4c ("tpm: take TPM chip power gating out of

> > tpm_transmit()")

> > Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>

> > ---

> >  drivers/char/tpm/tpm2-space.c | 15 ++++++++++-----

> >  1 file changed, 10 insertions(+), 5 deletions(-)

> > 

> > diff --git a/drivers/char/tpm/tpm2-space.c b/drivers/char/tpm/tpm2-

> > space.c

> > index 784b8b3..9a29a40 100644

> > --- a/drivers/char/tpm/tpm2-space.c

> > +++ b/drivers/char/tpm/tpm2-space.c

> > @@ -58,12 +58,17 @@ int tpm2_init_space(struct tpm_space *space,

> > unsigned int buf_size)

> >  

> >  void tpm2_del_space(struct tpm_chip *chip, struct tpm_space *space)

> >  {

> > -	mutex_lock(&chip->tpm_mutex);

> > -	if (!tpm_chip_start(chip)) {

> > -		tpm2_flush_sessions(chip, space);

> > -		tpm_chip_stop(chip);

> > +	down_read(&chip->ops_sem);

> > +	if (chip->ops) {

> > +		mutex_lock(&chip->tpm_mutex);

> > +		if (!tpm_chip_start(chip)) {

> > +			tpm2_flush_sessions(chip, space);

> > +			tpm_chip_stop(chip);

> > +		}

> > +		mutex_unlock(&chip->tpm_mutex);

> >  	}

> > -	mutex_unlock(&chip->tpm_mutex);

> > +	up_read(&chip->ops_sem);

> > +

> >  	kfree(space->context_buf);

> >  	kfree(space->session_buf);

> >  }

> 

> 

> Actually, this still isn't right.  As I said to the last person who

> reported this, we should be doing a get/put on the ops, not rolling our

> own here:

> 

> https://lore.kernel.org/linux-integrity/e7566e1e48f5be9dca034b4bfb67683b5d3cb88f.camel@HansenPartnership.com/

> 

> The reporter went silent before we could get this tested, but could you

> try, please, because your patch is still hand rolling the ops get/put,

> just slightly better than it had been done previously.

> 

> James


Thanks for pointing this out. I'd strongly support Jason's proposal:

https://lore.kernel.org/linux-integrity/20201215175624.GG5487@ziepe.ca/

It's the best long-term way to fix this.

Honestly, I have to admit that this thread leaked from me. It happened
exactly at the time when I was on vacation. Something must have gone wrong
when I pulled emails after that.

/Jarkko
Greg KH Feb. 5, 2021, 6:51 a.m. UTC | #3
On Fri, Feb 05, 2021 at 12:50:43AM +0100, Lino Sanfilippo wrote:
> From: Lino Sanfilippo <l.sanfilippo@kunbus.com>

> 

> In tpm2_del_space() chip->ops is used for flushing the sessions. However

> this function may be called after tpm_chip_unregister() which sets

> the chip->ops pointer to NULL.

> Avoid a possible NULL pointer dereference by checking if chip->ops is still

> valid before accessing it.

> 

> Fixes: a3fbfae82b4c ("tpm: take TPM chip power gating out of tpm_transmit()")

> Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>

> ---

>  drivers/char/tpm/tpm2-space.c | 15 ++++++++++-----

>  1 file changed, 10 insertions(+), 5 deletions(-)


<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>
Lino Sanfilippo Feb. 5, 2021, 10:30 a.m. UTC | #4
Hi James,

On 05.02.21 01:34, James Bottomley wrote:
> 

> 

> Actually, this still isn't right.  As I said to the last person who

> reported this, we should be doing a get/put on the ops, not rolling our

> own here:

> 

> https://lore.kernel.org/linux-integrity/e7566e1e48f5be9dca034b4bfb67683b5d3cb88f.camel@HansenPartnership.com/

> 


Thanks for pointing this out, I was not aware of this discussion.

> The reporter went silent before we could get this tested, but could you

> try, please, because your patch is still hand rolling the ops get/put,

> just slightly better than it had been done previously.


I tested your patch and it fixes the issue. Your solution seems indeed much cleaner.

FWIW:

Tested-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>


Best Regards,
Lino
James Bottomley Feb. 5, 2021, 4:48 p.m. UTC | #5
On Fri, 2021-02-05 at 04:18 +0200, Jarkko Sakkinen wrote:
> On Thu, Feb 04, 2021 at 04:34:11PM -0800, James Bottomley wrote:

> > On Fri, 2021-02-05 at 00:50 +0100, Lino Sanfilippo wrote:

> > > From: Lino Sanfilippo <l.sanfilippo@kunbus.com>

> > > 

> > > In tpm2_del_space() chip->ops is used for flushing the sessions.

> > > However

> > > this function may be called after tpm_chip_unregister() which

> > > sets

> > > the chip->ops pointer to NULL.

> > > Avoid a possible NULL pointer dereference by checking if chip-

> > > >ops is

> > > still

> > > valid before accessing it.

> > > 

> > > Fixes: a3fbfae82b4c ("tpm: take TPM chip power gating out of

> > > tpm_transmit()")

> > > Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>

> > > ---

> > >  drivers/char/tpm/tpm2-space.c | 15 ++++++++++-----

> > >  1 file changed, 10 insertions(+), 5 deletions(-)

> > > 

> > > diff --git a/drivers/char/tpm/tpm2-space.c

> > > b/drivers/char/tpm/tpm2-

> > > space.c

> > > index 784b8b3..9a29a40 100644

> > > --- a/drivers/char/tpm/tpm2-space.c

> > > +++ b/drivers/char/tpm/tpm2-space.c

> > > @@ -58,12 +58,17 @@ int tpm2_init_space(struct tpm_space *space,

> > > unsigned int buf_size)

> > >  

> > >  void tpm2_del_space(struct tpm_chip *chip, struct tpm_space

> > > *space)

> > >  {

> > > -	mutex_lock(&chip->tpm_mutex);

> > > -	if (!tpm_chip_start(chip)) {

> > > -		tpm2_flush_sessions(chip, space);

> > > -		tpm_chip_stop(chip);

> > > +	down_read(&chip->ops_sem);

> > > +	if (chip->ops) {

> > > +		mutex_lock(&chip->tpm_mutex);

> > > +		if (!tpm_chip_start(chip)) {

> > > +			tpm2_flush_sessions(chip, space);

> > > +			tpm_chip_stop(chip);

> > > +		}

> > > +		mutex_unlock(&chip->tpm_mutex);

> > >  	}

> > > -	mutex_unlock(&chip->tpm_mutex);

> > > +	up_read(&chip->ops_sem);

> > > +

> > >  	kfree(space->context_buf);

> > >  	kfree(space->session_buf);

> > >  }

> > 

> > Actually, this still isn't right.  As I said to the last person who

> > reported this, we should be doing a get/put on the ops, not rolling

> > our

> > own here:

> > 

> > https://lore.kernel.org/linux-integrity/e7566e1e48f5be9dca034b4bfb67683b5d3cb88f.camel@HansenPartnership.com/

> > 

> > The reporter went silent before we could get this tested, but could

> > you

> > try, please, because your patch is still hand rolling the ops

> > get/put,

> > just slightly better than it had been done previously.

> > 

> > James

> 

> Thanks for pointing this out. I'd strongly support Jason's proposal:

> 

> https://lore.kernel.org/linux-integrity/20201215175624.GG5487@ziepe.ca/

> 

> It's the best long-term way to fix this.


Really, no it's not.  It introduces extra mechanism we don't need.

To recap the issue: character devices already have an automatic
mechanism which holds a reference to the struct device while the
character node is open so the default is to release resources on final
put of the struct device.

tpm 2 is special because we have two character device nodes: /dev/tpm0
and /dev/tpmrm0.  The way we make this work is that tpm0 is the master
and tpmrm0 the slave, so the slave holds an extra reference on the
master which is put when the slave final put happens.  This means that
our resources aren't freed until the final puts of both devices, which
is the model we're using.

The practical consequence of this model is that if you allocate a chip
structure with tpm_chip_alloc() you have to release it again by doing a
put of *both* devices.

However, patch fdc915f7f719 ("tpm: expose spaces via a device link
/dev/tpmrm<n>") contains two bugs: firstly it didn't add a devm action
release for devs and secondly it didn't update the only non-devm user
ibm vtpm to do the double put.

Stefan noticed the latter, so we got the bogus patch 8979b02aaf1d
("tpm: Fix reference count to main device") applied which simply breaks
the master/slave model by not taking a reference on the master for the
slave.  I'm not sure why I didn't notice the problem with this fix at
the time, but attention must have been elsewhere.

Subsequently we got ftpm added which copied the ibm vtpm put bug.

So I think 1/2 is the correct fix for all three bugs.  I just need to
find a way to verify it.

James
Jason Gunthorpe Feb. 5, 2021, 5:25 p.m. UTC | #6
On Fri, Feb 05, 2021 at 08:48:11AM -0800, James Bottomley wrote:
> > Thanks for pointing this out. I'd strongly support Jason's proposal:

> > 

> > https://lore.kernel.org/linux-integrity/20201215175624.GG5487@ziepe.ca/

> > 

> > It's the best long-term way to fix this.

> 

> Really, no it's not.  It introduces extra mechanism we don't need.


> To recap the issue: character devices already have an automatic

> mechanism which holds a reference to the struct device while the

> character node is open so the default is to release resources on final

> put of the struct device.


The refcount on the struct device only keeps the memory alive, it
doesn't say anything about the ops. We still need to lock and check
the ops each and every time they are used.

The fact cdev goes all the way till fput means we don't need the extra
get/put I suggested to Lino at all.

> The practical consequence of this model is that if you allocate a chip

> structure with tpm_chip_alloc() you have to release it again by doing a

> put of *both* devices.


The final put of the devs should be directly after the
cdev_device_del(), not in a devm. This became all confused because the
devs was created during alloc, not register. Having a device that is
initialized but will never be added is weird.

See sketch below.

> Stefan noticed the latter, so we got the bogus patch 8979b02aaf1d

> ("tpm: Fix reference count to main device") applied which simply breaks

> the master/slave model by not taking a reference on the master for the

> slave.  I'm not sure why I didn't notice the problem with this fix at

> the time, but attention must have been elsewhere.


Well, this is sort of OK because we never use the devs in TPM1, so we
end up freeing the chip with a positive refcount on the devs, which is
weird but not a functional bug.

Jason

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index ddaeceb7e10910..e07193a0dd4438 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -344,7 +344,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
 	chip->dev_num = rc;
 
 	device_initialize(&chip->dev);
-	device_initialize(&chip->devs);
 
 	chip->dev.class = tpm_class;
 	chip->dev.class->shutdown_pre = tpm_class_shutdown;
@@ -352,29 +351,12 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
 	chip->dev.parent = pdev;
 	chip->dev.groups = chip->groups;
 
-	chip->devs.parent = pdev;
-	chip->devs.class = tpmrm_class;
-	chip->devs.release = tpm_devs_release;
-	/* get extra reference on main device to hold on
-	 * behalf of devs.  This holds the chip structure
-	 * while cdevs is in use.  The corresponding put
-	 * is in the tpm_devs_release (TPM2 only)
-	 */
-	if (chip->flags & TPM_CHIP_FLAG_TPM2)
-		get_device(&chip->dev);
-
 	if (chip->dev_num == 0)
 		chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
 	else
 		chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);
 
-	chip->devs.devt =
-		MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);
-
 	rc = dev_set_name(&chip->dev, "tpm%d", chip->dev_num);
-	if (rc)
-		goto out;
-	rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);
 	if (rc)
 		goto out;
 
@@ -382,9 +364,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
 		chip->flags |= TPM_CHIP_FLAG_VIRTUAL;
 
 	cdev_init(&chip->cdev, &tpm_fops);
-	cdev_init(&chip->cdevs, &tpmrm_fops);
 	chip->cdev.owner = THIS_MODULE;
-	chip->cdevs.owner = THIS_MODULE;
 
 	rc = tpm2_init_space(&chip->work_space, TPM2_SPACE_BUFFER_SIZE);
 	if (rc) {
@@ -396,7 +376,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
 	return chip;
 
 out:
-	put_device(&chip->devs);
 	put_device(&chip->dev);
 	return ERR_PTR(rc);
 }
@@ -445,13 +424,33 @@ static int tpm_add_char_device(struct tpm_chip *chip)
 	}
 
 	if (chip->flags & TPM_CHIP_FLAG_TPM2) {
+		device_initialize(&chip->devs);
+		chip->devs.parent = pdev;
+		chip->devs.class = tpmrm_class;
+		rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);
+		if (rc)
+			goto out_put_devs;
+
+		/*
+                 * get extra reference on main device to hold on behalf of devs.
+                 * This holds the chip structure while cdevs is in use. The
+		 * corresponding put is in the tpm_devs_release.
+		 */
+		get_device(&chip->dev);
+		chip->devs.release = tpm_devs_release;
+
+		chip->devs.devt =
+			MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);
+		cdev_init(&chip->cdevs, &tpmrm_fops);
+		chip->cdevs.owner = THIS_MODULE;
+
 		rc = cdev_device_add(&chip->cdevs, &chip->devs);
 		if (rc) {
 			dev_err(&chip->devs,
 				"unable to cdev_device_add() %s, major %d, minor %d, err=%d\n",
 				dev_name(&chip->devs), MAJOR(chip->devs.devt),
 				MINOR(chip->devs.devt), rc);
-			return rc;
+			goto out_put_devs;
 		}
 	}
 
@@ -460,6 +459,10 @@ static int tpm_add_char_device(struct tpm_chip *chip)
 	idr_replace(&dev_nums_idr, chip, chip->dev_num);
 	mutex_unlock(&idr_lock);
 
+out_put_devs:
+	put_device(&chip->devs);
+out_del_dev:
+	cdev_device_del(&chip->cdev);
 	return rc;
 }
 
@@ -640,8 +643,10 @@ void tpm_chip_unregister(struct tpm_chip *chip)
 	if (IS_ENABLED(CONFIG_HW_RANDOM_TPM))
 		hwrng_unregister(&chip->hwrng);
 	tpm_bios_log_teardown(chip);
-	if (chip->flags & TPM_CHIP_FLAG_TPM2)
+	if (chip->flags & TPM_CHIP_FLAG_TPM2) {
 		cdev_device_del(&chip->cdevs, &chip->devs);
+		put_device(&chip->devs);
+	}
 	tpm_del_char_device(chip);
 }
 EXPORT_SYMBOL_GPL(tpm_chip_unregister);
James Bottomley Feb. 5, 2021, 5:54 p.m. UTC | #7
On Fri, 2021-02-05 at 13:25 -0400, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 08:48:11AM -0800, James Bottomley wrote:

> > > Thanks for pointing this out. I'd strongly support Jason's

> > > proposal:

> > > 

> > > https://lore.kernel.org/linux-integrity/20201215175624.GG5487@ziepe.ca/

> > > 

> > > It's the best long-term way to fix this.

> > 

> > Really, no it's not.  It introduces extra mechanism we don't need.

> > To recap the issue: character devices already have an automatic

> > mechanism which holds a reference to the struct device while the

> > character node is open so the default is to release resources on

> > final

> > put of the struct device.

> 

> The refcount on the struct device only keeps the memory alive, it

> doesn't say anything about the ops. We still need to lock and check

> the ops each and every time they are used.


I think this is the crux of our disagreement: I think the ops doesn't
matter because to call try_get_ops you have to have a chip structure
and the only way you get a chip structure is if you hold a device
containing it, in which case the device hold guarantees the chip can't
be freed.  Or if you pass in TPM_ANY_NUM to an operation which calls 
tpm_chip_find_get() which iterates the idr to find a chip under the idr
lock.  If you find a chip device at the idr, you're guaranteed it
exists, because elimination of it is the first thing the release does
and if you find a dying dev (i.e. the release routine blocks on the idr
mutex trying to kill the chip attachment), try_get_ops() fails because
the ops are already NULL.

In either case, I think you get returned a device to which you hold a
reference.  Is there any other case where you can get a chip without
also getting a device reference?

I'll answer the other point in a separate email, but I think the
principle sounds OK: we could do the final put right after we del the
char devices because that's called in the module release routine and
thus not have to rely on the devm actions which, as you say, are an
annoying complication.

James
Jason Gunthorpe Feb. 6, 2021, 1:02 a.m. UTC | #8
On Fri, Feb 05, 2021 at 09:54:29AM -0800, James Bottomley wrote:
> On Fri, 2021-02-05 at 13:25 -0400, Jason Gunthorpe wrote:

> > On Fri, Feb 05, 2021 at 08:48:11AM -0800, James Bottomley wrote:

> > > > Thanks for pointing this out. I'd strongly support Jason's

> > > > proposal:

> > > > 

> > > > https://lore.kernel.org/linux-integrity/20201215175624.GG5487@ziepe.ca/

> > > > 

> > > > It's the best long-term way to fix this.

> > > 

> > > Really, no it's not.  It introduces extra mechanism we don't need.

> > > To recap the issue: character devices already have an automatic

> > > mechanism which holds a reference to the struct device while the

> > > character node is open so the default is to release resources on

> > > final

> > > put of the struct device.

> > 

> > The refcount on the struct device only keeps the memory alive, it

> > doesn't say anything about the ops. We still need to lock and check

> > the ops each and every time they are used.

> 

> I think this is the crux of our disagreement: I think the ops doesn't

> matter because to call try_get_ops you have to have a chip structure

> and the only way you get a chip structure is if you hold a device

> containing it, in which case the device hold guarantees the chip can't

> be freed.  


The get_device() only guarentees the chip memory hasn't been kfree'd.

It doesn't mean tpm_chip_unregister() hasn't already run, completed
and set ops == NULL.

In the file path we have the get_device implicitly by the file's
i_cdev pointing to that chain of refcounts that ends on the chip's
main struct device. So we know the chip memory cannot be kfreed while
the struct file exists.

However, there is nothing preventing the struct file from living past
tpm_chip_unregister(). cdev_device_del() does not wait for all files's
to be closed, it only removes the ability to open new files. Open
files do prevent removal of the module, but it does not prevent
hot-unplug of the underling device, eg with sysfs unbind.

In fact, nothing about tpm_chip_unregister() excludes open files.

So it is perfectly legal for tpm_chip_unregister() to return, the devm
put_device to be called, and the refcount of the chip to still be
positive - held by open files.

In this situation ops will be NULL when file operations are called and
eg doing a tpm_chip_start will crash on:

	if (chip->ops->clk_enable)

To use the TPM driver, the rules are you must hold a get_device() on a
chip, and then upgrade it to a 'tpm_try_get_ops' before calling any
driver functions. 

Only the critical region formed by tpm_try_get_ops() will prevent
tpm_chip_unregister() from completing. It is the thing that ensures
the driver is actually present.

> In either case, I think you get returned a device to which you hold a

> reference.  Is there any other case where you can get a chip without

> also getting a device reference?


There should be no case where there is a struct chip pointer without
something owning a reference for that pointer.

> I'll answer the other point in a separate email, but I think the

> principle sounds OK: we could do the final put right after we del the

> char devices because that's called in the module release routine and

> thus not have to rely on the devm actions which, as you say, are an

> annoying complication.


I think tpm_alloc() should have an error unwind that is only
put_device(chip->dev), anything else breaks the basic programming
pattern of alloc/register.

Jason
James Bottomley Feb. 6, 2021, 1:08 a.m. UTC | #9
On Fri, 2021-02-05 at 13:25 -0400, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 08:48:11AM -0800, James Bottomley wrote:

[...]
> > The practical consequence of this model is that if you allocate a

> > chip structure with tpm_chip_alloc() you have to release it again

> > by doing a put of *both* devices.

> 

> The final put of the devs should be directly after the

> cdev_device_del(), not in a devm. This became all confused because

> the devs was created during alloc, not register. Having a device that

> is initialized but will never be added is weird.

> 

> See sketch below.

> 

> > Stefan noticed the latter, so we got the bogus patch 8979b02aaf1d

> > ("tpm: Fix reference count to main device") applied which simply

> > breaks the master/slave model by not taking a reference on the

> > master for the slave.  I'm not sure why I didn't notice the problem

> > with this fix at the time, but attention must have been elsewhere.

> 

> Well, this is sort of OK because we never use the devs in TPM1, so we

> end up freeing the chip with a positive refcount on the devs, which

> is weird but not a functional bug.

> 

> Jason

> 

> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-

> chip.c

> index ddaeceb7e10910..e07193a0dd4438 100644

> --- a/drivers/char/tpm/tpm-chip.c

> +++ b/drivers/char/tpm/tpm-chip.c

> @@ -344,7 +344,6 @@ struct tpm_chip *tpm_chip_alloc(struct device

> *pdev,

>  	chip->dev_num = rc;

>  

>  	device_initialize(&chip->dev);

> -	device_initialize(&chip->devs);

>  

>  	chip->dev.class = tpm_class;

>  	chip->dev.class->shutdown_pre = tpm_class_shutdown;

> @@ -352,29 +351,12 @@ struct tpm_chip *tpm_chip_alloc(struct device

> *pdev,

>  	chip->dev.parent = pdev;

>  	chip->dev.groups = chip->groups;

>  

> -	chip->devs.parent = pdev;

> -	chip->devs.class = tpmrm_class;

> -	chip->devs.release = tpm_devs_release;

> -	/* get extra reference on main device to hold on

> -	 * behalf of devs.  This holds the chip structure

> -	 * while cdevs is in use.  The corresponding put

> -	 * is in the tpm_devs_release (TPM2 only)

> -	 */

> -	if (chip->flags & TPM_CHIP_FLAG_TPM2)

> -		get_device(&chip->dev);

> -

>  	if (chip->dev_num == 0)

>  		chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);

>  	else

>  		chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);

>  

> -	chip->devs.devt =

> -		MKDEV(MAJOR(tpm_devt), chip->dev_num +

> TPM_NUM_DEVICES);

> -

>  	rc = dev_set_name(&chip->dev, "tpm%d", chip->dev_num);

> -	if (rc)

> -		goto out;

> -	rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);

>  	if (rc)

>  		goto out;

>  

> @@ -382,9 +364,7 @@ struct tpm_chip *tpm_chip_alloc(struct device

> *pdev,

>  		chip->flags |= TPM_CHIP_FLAG_VIRTUAL;

>  

>  	cdev_init(&chip->cdev, &tpm_fops);

> -	cdev_init(&chip->cdevs, &tpmrm_fops);

>  	chip->cdev.owner = THIS_MODULE;

> -	chip->cdevs.owner = THIS_MODULE;

>  

>  	rc = tpm2_init_space(&chip->work_space,

> TPM2_SPACE_BUFFER_SIZE);

>  	if (rc) {

> @@ -396,7 +376,6 @@ struct tpm_chip *tpm_chip_alloc(struct device

> *pdev,

>  	return chip;

>  

>  out:

> -	put_device(&chip->devs);

>  	put_device(&chip->dev);

>  	return ERR_PTR(rc);

>  }

> @@ -445,13 +424,33 @@ static int tpm_add_char_device(struct tpm_chip

> *chip)

>  	}

>  

>  	if (chip->flags & TPM_CHIP_FLAG_TPM2) {

> +		device_initialize(&chip->devs);

> +		chip->devs.parent = pdev;

> +		chip->devs.class = tpmrm_class;

> +		rc = dev_set_name(&chip->devs, "tpmrm%d", chip-

> >dev_num);

> +		if (rc)

> +			goto out_put_devs;

> +

> +		/*

> +                 * get extra reference on main device to hold on

> behalf of devs.

> +                 * This holds the chip structure while cdevs is in

> use. The

> +		 * corresponding put is in the tpm_devs_release.

> +		 */

> +		get_device(&chip->dev);

> +		chip->devs.release = tpm_devs_release;

> +

> +		chip->devs.devt =

> +			MKDEV(MAJOR(tpm_devt), chip->dev_num +

> TPM_NUM_DEVICES);

> +		cdev_init(&chip->cdevs, &tpmrm_fops);

> +		chip->cdevs.owner = THIS_MODULE;

> +


Effectively all of this shuffles the tpmrm device allocation from
chip_alloc to chip_add ... I'm not averse to this but it does mean we
can suffer allocation failures now in the add routine and it makes
error handling a bit more complex.  On the other hand we can now check
the TPM2 flag correctly, so it's swings and roundabouts.

>  		rc = cdev_device_add(&chip->cdevs, &chip->devs);

>  		if (rc) {

>  			dev_err(&chip->devs,

>  				"unable to cdev_device_add() %s, major

> %d, minor %d, err=%d\n",

>  				dev_name(&chip->devs), MAJOR(chip-

> >devs.devt),

>  				MINOR(chip->devs.devt), rc);

> -			return rc;

> +			goto out_put_devs;

>  		}

>  	}

>  

> @@ -460,6 +459,10 @@ static int tpm_add_char_device(struct tpm_chip

> *chip)

>  	idr_replace(&dev_nums_idr, chip, chip->dev_num);

>  	mutex_unlock(&idr_lock);

>  

> +out_put_devs:

> +	put_device(&chip->devs);


I think there should be a if (chip->flags & TPM_CHIP_FLAG_TPM2) here.

I realise you got everything semantically correct and you only ever go
to this label from somewhere that already has the check, but guess what
will happen when the bot rewriters get hold of this ...

> +out_del_dev:

> +	cdev_device_del(&chip->cdev);

>  	return rc;

>  }

>  

> @@ -640,8 +643,10 @@ void tpm_chip_unregister(struct tpm_chip *chip)

>  	if (IS_ENABLED(CONFIG_HW_RANDOM_TPM))

>  		hwrng_unregister(&chip->hwrng);

>  	tpm_bios_log_teardown(chip);

> -	if (chip->flags & TPM_CHIP_FLAG_TPM2)

> +	if (chip->flags & TPM_CHIP_FLAG_TPM2) {

>  		cdev_device_del(&chip->cdevs, &chip->devs);

> +		put_device(&chip->devs);

> +	}

>  	tpm_del_char_device(chip);


Actually, I think you want to go further here.  If there's a 

put_device(&chips->dev)

as the last statement (or moved into tpm_del_char_device) we should now
have no active reference on the devices from the kernel and we can
eliminate the 

	rc = devm_add_action_or_reset(pdev,
				      (void (*)(void *)) put_device,
				      &chip->dev);

In tpmm_chip_alloc().  That way both /dev/tpm and /dev/tpmrm have
identical lifetime properties.

James
Jason Gunthorpe Feb. 6, 2021, 1:34 a.m. UTC | #10
On Fri, Feb 05, 2021 at 05:08:20PM -0800, James Bottomley wrote:
 
> Effectively all of this shuffles the tpmrm device allocation from

> chip_alloc to chip_add ... I'm not averse to this but it does mean we

> can suffer allocation failures now in the add routine and it makes

> error handling a bit more complex.  


We already have to handle failures here, so this doesn't seem any
worse (and the existing error handling looked wrong, I fixed it)

> >  		rc = cdev_device_add(&chip->cdevs, &chip->devs);

> >  		if (rc) {

> >  			dev_err(&chip->devs,

> >  				"unable to cdev_device_add() %s, major

> > %d, minor %d, err=%d\n",

> >  				dev_name(&chip->devs), MAJOR(chip-

> > >devs.devt),

> >  				MINOR(chip->devs.devt), rc);

> > -			return rc;

> > +			goto out_put_devs;

> >  		}

> >  	}

> >  

> > @@ -460,6 +459,10 @@ static int tpm_add_char_device(struct tpm_chip

> > *chip)

> >  	idr_replace(&dev_nums_idr, chip, chip->dev_num);

> >  	mutex_unlock(&idr_lock);

> >  

> > +out_put_devs:

> > +	put_device(&chip->devs);

> 

> I think there should be a if (chip->flags & TPM_CHIP_FLAG_TPM2) here.

> 

> I realise you got everything semantically correct and you only ever go

> to this label from somewhere that already has the check, but guess what

> will happen when the bot rewriters get hold of this ...


Makes sense
 
> > +out_del_dev:

> > +	cdev_device_del(&chip->cdev);

> >  	return rc;

> >  }

> >  

> > @@ -640,8 +643,10 @@ void tpm_chip_unregister(struct tpm_chip *chip)

> >  	if (IS_ENABLED(CONFIG_HW_RANDOM_TPM))

> >  		hwrng_unregister(&chip->hwrng);

> >  	tpm_bios_log_teardown(chip);

> > -	if (chip->flags & TPM_CHIP_FLAG_TPM2)

> > +	if (chip->flags & TPM_CHIP_FLAG_TPM2) {

> >  		cdev_device_del(&chip->cdevs, &chip->devs);

> > +		put_device(&chip->devs);

> > +	}

> >  	tpm_del_char_device(chip);

> 

> Actually, I think you want to go further here.  If there's a 

> 

> put_device(&chips->dev)

>

> as the last statement (or moved into tpm_del_char_device) we should

> now


The proper TPM driver remove sequence is:

remove()
{
   /* Upon return the core guarentees no driver callback is running or
    * will ever run again */
   tpm_chip_unregister()

   // Safe to do this because nothing will now use the HW resources
   free_irq(chip->XXX)
   unmap_memory(chip->YYY)

   // Now we are done with the memory
   put_device(&chip-dev);
}

ie the general driver design should expect the chip memory to continue
to exist after unregister because it will need to refer to it to
destroy any driver resources.

> have no active reference on the devices from the kernel and we can

> eliminate the 

>

> 	rc = devm_add_action_or_reset(pdev,

> 				      (void (*)(void *)) put_device,

> 				      &chip->dev);


This devm exists because adding the put_device to the error unwinds of
every driver probe function was too daunting. It can be removed only
if someone goes and updates every driver to correctly error-unwind
tpm_chip_alloc() with put_device() in the driver probe function.

Jason
Lino Sanfilippo Feb. 9, 2021, 11:52 a.m. UTC | #11
Hi Jason,

On 05.02.21 18:25, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 08:48:11AM -0800, James Bottomley wrote:

>>> Thanks for pointing this out. I'd strongly support Jason's proposal:

>>>

>>> https://lore.kernel.org/linux-integrity/20201215175624.GG5487@ziepe.ca/

>>>

>>> It's the best long-term way to fix this.

>>

>> Really, no it's not.  It introduces extra mechanism we don't need.

> 

>> To recap the issue: character devices already have an automatic

>> mechanism which holds a reference to the struct device while the

>> character node is open so the default is to release resources on final

>> put of the struct device.

> 

> The refcount on the struct device only keeps the memory alive, it

> doesn't say anything about the ops. We still need to lock and check

> the ops each and every time they are used.

> 

> The fact cdev goes all the way till fput means we don't need the extra

> get/put I suggested to Lino at all.

> 

>> The practical consequence of this model is that if you allocate a chip

>> structure with tpm_chip_alloc() you have to release it again by doing a

>> put of *both* devices.

> 

> The final put of the devs should be directly after the

> cdev_device_del(), not in a devm. This became all confused because the

> devs was created during alloc, not register. Having a device that is

> initialized but will never be added is weird.

> 

> See sketch below.

> 

>> Stefan noticed the latter, so we got the bogus patch 8979b02aaf1d

>> ("tpm: Fix reference count to main device") applied which simply breaks

>> the master/slave model by not taking a reference on the master for the

>> slave.  I'm not sure why I didn't notice the problem with this fix at

>> the time, but attention must have been elsewhere.

> 

> Well, this is sort of OK because we never use the devs in TPM1, so we

> end up freeing the chip with a positive refcount on the devs, which is

> weird but not a functional bug.

> 

> Jason

> 

> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c

> index ddaeceb7e10910..e07193a0dd4438 100644

> --- a/drivers/char/tpm/tpm-chip.c

> +++ b/drivers/char/tpm/tpm-chip.c

> @@ -344,7 +344,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,

>  	chip->dev_num = rc;

>  

>  	device_initialize(&chip->dev);

> -	device_initialize(&chip->devs);

>  

>  	chip->dev.class = tpm_class;

>  	chip->dev.class->shutdown_pre = tpm_class_shutdown;

> @@ -352,29 +351,12 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,

>  	chip->dev.parent = pdev;

>  	chip->dev.groups = chip->groups;

>  

> -	chip->devs.parent = pdev;

> -	chip->devs.class = tpmrm_class;

> -	chip->devs.release = tpm_devs_release;

> -	/* get extra reference on main device to hold on

> -	 * behalf of devs.  This holds the chip structure

> -	 * while cdevs is in use.  The corresponding put

> -	 * is in the tpm_devs_release (TPM2 only)

> -	 */

> -	if (chip->flags & TPM_CHIP_FLAG_TPM2)

> -		get_device(&chip->dev);

> -

>  	if (chip->dev_num == 0)

>  		chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);

>  	else

>  		chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);

>  

> -	chip->devs.devt =

> -		MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);

> -

>  	rc = dev_set_name(&chip->dev, "tpm%d", chip->dev_num);

> -	if (rc)

> -		goto out;

> -	rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);

>  	if (rc)

>  		goto out;

>  

> @@ -382,9 +364,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,

>  		chip->flags |= TPM_CHIP_FLAG_VIRTUAL;

>  

>  	cdev_init(&chip->cdev, &tpm_fops);

> -	cdev_init(&chip->cdevs, &tpmrm_fops);

>  	chip->cdev.owner = THIS_MODULE;

> -	chip->cdevs.owner = THIS_MODULE;

>  

>  	rc = tpm2_init_space(&chip->work_space, TPM2_SPACE_BUFFER_SIZE);

>  	if (rc) {

> @@ -396,7 +376,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,

>  	return chip;

>  

>  out:

> -	put_device(&chip->devs);

>  	put_device(&chip->dev);

>  	return ERR_PTR(rc);

>  }

> @@ -445,13 +424,33 @@ static int tpm_add_char_device(struct tpm_chip *chip)

>  	}

>  

>  	if (chip->flags & TPM_CHIP_FLAG_TPM2) {

> +		device_initialize(&chip->devs);

> +		chip->devs.parent = pdev;

> +		chip->devs.class = tpmrm_class;

> +		rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);

> +		if (rc)

> +			goto out_put_devs;

> +

> +		/*

> +                 * get extra reference on main device to hold on behalf of devs.

> +                 * This holds the chip structure while cdevs is in use. The

> +		 * corresponding put is in the tpm_devs_release.

> +		 */

> +		get_device(&chip->dev);

> +		chip->devs.release = tpm_devs_release;

> +

> +		chip->devs.devt =

> +			MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);

> +		cdev_init(&chip->cdevs, &tpmrm_fops);

> +		chip->cdevs.owner = THIS_MODULE;

> +

>  		rc = cdev_device_add(&chip->cdevs, &chip->devs);

>  		if (rc) {

>  			dev_err(&chip->devs,

>  				"unable to cdev_device_add() %s, major %d, minor %d, err=%d\n",

>  				dev_name(&chip->devs), MAJOR(chip->devs.devt),

>  				MINOR(chip->devs.devt), rc);

> -			return rc;

> +			goto out_put_devs;

>  		}

>  	}

>  

> @@ -460,6 +459,10 @@ static int tpm_add_char_device(struct tpm_chip *chip)

>  	idr_replace(&dev_nums_idr, chip, chip->dev_num);

>  	mutex_unlock(&idr_lock);

>  

> +out_put_devs:

> +	put_device(&chip->devs);

> +out_del_dev:

> +	cdev_device_del(&chip->cdev);

>  	return rc;

>  }

>  

> @@ -640,8 +643,10 @@ void tpm_chip_unregister(struct tpm_chip *chip)

>  	if (IS_ENABLED(CONFIG_HW_RANDOM_TPM))

>  		hwrng_unregister(&chip->hwrng);

>  	tpm_bios_log_teardown(chip);

> -	if (chip->flags & TPM_CHIP_FLAG_TPM2)

> +	if (chip->flags & TPM_CHIP_FLAG_TPM2) {

>  		cdev_device_del(&chip->cdevs, &chip->devs);

> +		put_device(&chip->devs);

> +	}

>  	tpm_del_char_device(chip);

>  }

>  EXPORT_SYMBOL_GPL(tpm_chip_unregister);

> 


I tested the solution you scetched and it fixes the issue for me. Will you send a (real) patch for this?

Best regards,
Lino
Jason Gunthorpe Feb. 9, 2021, 1:36 p.m. UTC | #12
On Tue, Feb 09, 2021 at 12:52:17PM +0100, Lino Sanfilippo wrote:
> > @@ -640,8 +643,10 @@ void tpm_chip_unregister(struct tpm_chip *chip)

> >  	if (IS_ENABLED(CONFIG_HW_RANDOM_TPM))

> >  		hwrng_unregister(&chip->hwrng);

> >  	tpm_bios_log_teardown(chip);

> > -	if (chip->flags & TPM_CHIP_FLAG_TPM2)

> > +	if (chip->flags & TPM_CHIP_FLAG_TPM2) {

> >  		cdev_device_del(&chip->cdevs, &chip->devs);

> > +		put_device(&chip->devs);

> > +	}

> >  	tpm_del_char_device(chip);

> >  }

> >  EXPORT_SYMBOL_GPL(tpm_chip_unregister);

> > 

> 

> I tested the solution you scetched and it fixes the issue for me. Will you send a (real) patch for this?


No, feel free to bundle this up with any fixes needed and send it with
a Signed-off-by from both of us

I did it pretty fast so it will need a careful read that there isn't a
typo

Thanks,
Jason
Lino Sanfilippo Feb. 9, 2021, 1:39 p.m. UTC | #13
On 09.02.21 14:36, Jason Gunthorpe wrote:

>>>  EXPORT_SYMBOL_GPL(tpm_chip_unregister);

>>>

>>

>> I tested the solution you scetched and it fixes the issue for me. Will you send a (real) patch for this?

> 

> No, feel free to bundle this up with any fixes needed and send it with

> a Signed-off-by from both of us

> 

> I did it pretty fast so it will need a careful read that there isn't a

> typo

> 


Ok, will do so.

Regards,
Lino
Jarkko Sakkinen Feb. 12, 2021, 10:59 a.m. UTC | #14
On Tue, Feb 09, 2021 at 12:52:17PM +0100, Lino Sanfilippo wrote:
> Hi Jason,

> 

> On 05.02.21 18:25, Jason Gunthorpe wrote:

> > On Fri, Feb 05, 2021 at 08:48:11AM -0800, James Bottomley wrote:

> >>> Thanks for pointing this out. I'd strongly support Jason's proposal:

> >>>

> >>> https://lore.kernel.org/linux-integrity/20201215175624.GG5487@ziepe.ca/

> >>>

> >>> It's the best long-term way to fix this.

> >>

> >> Really, no it's not.  It introduces extra mechanism we don't need.

> > 

> >> To recap the issue: character devices already have an automatic

> >> mechanism which holds a reference to the struct device while the

> >> character node is open so the default is to release resources on final

> >> put of the struct device.

> > 

> > The refcount on the struct device only keeps the memory alive, it

> > doesn't say anything about the ops. We still need to lock and check

> > the ops each and every time they are used.

> > 

> > The fact cdev goes all the way till fput means we don't need the extra

> > get/put I suggested to Lino at all.

> > 

> >> The practical consequence of this model is that if you allocate a chip

> >> structure with tpm_chip_alloc() you have to release it again by doing a

> >> put of *both* devices.

> > 

> > The final put of the devs should be directly after the

> > cdev_device_del(), not in a devm. This became all confused because the

> > devs was created during alloc, not register. Having a device that is

> > initialized but will never be added is weird.

> > 

> > See sketch below.

> > 

> >> Stefan noticed the latter, so we got the bogus patch 8979b02aaf1d

> >> ("tpm: Fix reference count to main device") applied which simply breaks

> >> the master/slave model by not taking a reference on the master for the

> >> slave.  I'm not sure why I didn't notice the problem with this fix at

> >> the time, but attention must have been elsewhere.

> > 

> > Well, this is sort of OK because we never use the devs in TPM1, so we

> > end up freeing the chip with a positive refcount on the devs, which is

> > weird but not a functional bug.

> > 

> > Jason

> > 

> > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c

> > index ddaeceb7e10910..e07193a0dd4438 100644

> > --- a/drivers/char/tpm/tpm-chip.c

> > +++ b/drivers/char/tpm/tpm-chip.c

> > @@ -344,7 +344,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,

> >  	chip->dev_num = rc;

> >  

> >  	device_initialize(&chip->dev);

> > -	device_initialize(&chip->devs);

> >  

> >  	chip->dev.class = tpm_class;

> >  	chip->dev.class->shutdown_pre = tpm_class_shutdown;

> > @@ -352,29 +351,12 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,

> >  	chip->dev.parent = pdev;

> >  	chip->dev.groups = chip->groups;

> >  

> > -	chip->devs.parent = pdev;

> > -	chip->devs.class = tpmrm_class;

> > -	chip->devs.release = tpm_devs_release;

> > -	/* get extra reference on main device to hold on

> > -	 * behalf of devs.  This holds the chip structure

> > -	 * while cdevs is in use.  The corresponding put

> > -	 * is in the tpm_devs_release (TPM2 only)

> > -	 */

> > -	if (chip->flags & TPM_CHIP_FLAG_TPM2)

> > -		get_device(&chip->dev);

> > -

> >  	if (chip->dev_num == 0)

> >  		chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);

> >  	else

> >  		chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);

> >  

> > -	chip->devs.devt =

> > -		MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);

> > -

> >  	rc = dev_set_name(&chip->dev, "tpm%d", chip->dev_num);

> > -	if (rc)

> > -		goto out;

> > -	rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);

> >  	if (rc)

> >  		goto out;

> >  

> > @@ -382,9 +364,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,

> >  		chip->flags |= TPM_CHIP_FLAG_VIRTUAL;

> >  

> >  	cdev_init(&chip->cdev, &tpm_fops);

> > -	cdev_init(&chip->cdevs, &tpmrm_fops);

> >  	chip->cdev.owner = THIS_MODULE;

> > -	chip->cdevs.owner = THIS_MODULE;

> >  

> >  	rc = tpm2_init_space(&chip->work_space, TPM2_SPACE_BUFFER_SIZE);

> >  	if (rc) {

> > @@ -396,7 +376,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,

> >  	return chip;

> >  

> >  out:

> > -	put_device(&chip->devs);

> >  	put_device(&chip->dev);

> >  	return ERR_PTR(rc);

> >  }

> > @@ -445,13 +424,33 @@ static int tpm_add_char_device(struct tpm_chip *chip)

> >  	}

> >  

> >  	if (chip->flags & TPM_CHIP_FLAG_TPM2) {

> > +		device_initialize(&chip->devs);

> > +		chip->devs.parent = pdev;

> > +		chip->devs.class = tpmrm_class;

> > +		rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);

> > +		if (rc)

> > +			goto out_put_devs;

> > +

> > +		/*

> > +                 * get extra reference on main device to hold on behalf of devs.

> > +                 * This holds the chip structure while cdevs is in use. The

> > +		 * corresponding put is in the tpm_devs_release.

> > +		 */

> > +		get_device(&chip->dev);

> > +		chip->devs.release = tpm_devs_release;

> > +

> > +		chip->devs.devt =

> > +			MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);

> > +		cdev_init(&chip->cdevs, &tpmrm_fops);

> > +		chip->cdevs.owner = THIS_MODULE;

> > +

> >  		rc = cdev_device_add(&chip->cdevs, &chip->devs);

> >  		if (rc) {

> >  			dev_err(&chip->devs,

> >  				"unable to cdev_device_add() %s, major %d, minor %d, err=%d\n",

> >  				dev_name(&chip->devs), MAJOR(chip->devs.devt),

> >  				MINOR(chip->devs.devt), rc);

> > -			return rc;

> > +			goto out_put_devs;

> >  		}

> >  	}

> >  

> > @@ -460,6 +459,10 @@ static int tpm_add_char_device(struct tpm_chip *chip)

> >  	idr_replace(&dev_nums_idr, chip, chip->dev_num);

> >  	mutex_unlock(&idr_lock);

> >  

> > +out_put_devs:

> > +	put_device(&chip->devs);

> > +out_del_dev:

> > +	cdev_device_del(&chip->cdev);

> >  	return rc;

> >  }

> >  

> > @@ -640,8 +643,10 @@ void tpm_chip_unregister(struct tpm_chip *chip)

> >  	if (IS_ENABLED(CONFIG_HW_RANDOM_TPM))

> >  		hwrng_unregister(&chip->hwrng);

> >  	tpm_bios_log_teardown(chip);

> > -	if (chip->flags & TPM_CHIP_FLAG_TPM2)

> > +	if (chip->flags & TPM_CHIP_FLAG_TPM2) {

> >  		cdev_device_del(&chip->cdevs, &chip->devs);

> > +		put_device(&chip->devs);

> > +	}

> >  	tpm_del_char_device(chip);

> >  }

> >  EXPORT_SYMBOL_GPL(tpm_chip_unregister);

> > 

> 

> I tested the solution you scetched and it fixes the issue for me. Will

> you send a (real) patch for this?


One *option*:

1. You take the Jason's patch.
2. https://www.kernel.org/doc/html/v5.10/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by

Just mentioning this, and spreading the knowledge about co-developed-by.

> Best regards,

> Lino


/Jarkko
Jarkko Sakkinen Feb. 12, 2021, 11:02 a.m. UTC | #15
On Tue, Feb 09, 2021 at 09:36:53AM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 09, 2021 at 12:52:17PM +0100, Lino Sanfilippo wrote:

> > > @@ -640,8 +643,10 @@ void tpm_chip_unregister(struct tpm_chip *chip)

> > >  	if (IS_ENABLED(CONFIG_HW_RANDOM_TPM))

> > >  		hwrng_unregister(&chip->hwrng);

> > >  	tpm_bios_log_teardown(chip);

> > > -	if (chip->flags & TPM_CHIP_FLAG_TPM2)

> > > +	if (chip->flags & TPM_CHIP_FLAG_TPM2) {

> > >  		cdev_device_del(&chip->cdevs, &chip->devs);

> > > +		put_device(&chip->devs);

> > > +	}

> > >  	tpm_del_char_device(chip);

> > >  }

> > >  EXPORT_SYMBOL_GPL(tpm_chip_unregister);

> > > 

> > 

> > I tested the solution you scetched and it fixes the issue for me. Will you send a (real) patch for this?

> 

> No, feel free to bundle this up with any fixes needed and send it with

> a Signed-off-by from both of us

> 

> I did it pretty fast so it will need a careful read that there isn't a

> typo

> 

> Thanks,

> Jason


Let's use CDB too as it exist and Sean kindly provided a better
documentation for it in some recent kernel release. It's great
exactly for this type of situation.

/Jarkko
Lino Sanfilippo Feb. 14, 2021, 5:22 p.m. UTC | #16
Hi,

On 12.02.21 at 11:59, Jarkko Sakkinen wrote:

>

> One *option*:

>

> 1. You take the Jason's patch.

> 2. https://www.kernel.org/doc/html/v5.10/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by

>

> Just mentioning this, and spreading the knowledge about co-developed-by.

>


This seems to me like a very good fit, thanks for pointing at this.
I will prepare a new patch series and use that tag.

Best regards,
Lino
Lino Sanfilippo March 6, 2021, 4:07 p.m. UTC | #17
Hi James,

> On 05.02.21 01:34, James Bottomley wrote:

>> The reporter went silent before we could get this tested, but could you

>> try, please, because your patch is still hand rolling the ops get/put,

>> just slightly better than it had been done previously.

>

> I tested your patch and it fixes the issue. Your solution seems indeed much cleaner.

>

> FWIW:

>

> Tested-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>

>


Are you going to send a patch for this? As stated above I verified that your
solution fixes the issue.

Best regards,
Lino
diff mbox series

Patch

diff --git a/drivers/char/tpm/tpm2-space.c b/drivers/char/tpm/tpm2-space.c
index 784b8b3..9a29a40 100644
--- a/drivers/char/tpm/tpm2-space.c
+++ b/drivers/char/tpm/tpm2-space.c
@@ -58,12 +58,17 @@  int tpm2_init_space(struct tpm_space *space, unsigned int buf_size)
 
 void tpm2_del_space(struct tpm_chip *chip, struct tpm_space *space)
 {
-	mutex_lock(&chip->tpm_mutex);
-	if (!tpm_chip_start(chip)) {
-		tpm2_flush_sessions(chip, space);
-		tpm_chip_stop(chip);
+	down_read(&chip->ops_sem);
+	if (chip->ops) {
+		mutex_lock(&chip->tpm_mutex);
+		if (!tpm_chip_start(chip)) {
+			tpm2_flush_sessions(chip, space);
+			tpm_chip_stop(chip);
+		}
+		mutex_unlock(&chip->tpm_mutex);
 	}
-	mutex_unlock(&chip->tpm_mutex);
+	up_read(&chip->ops_sem);
+
 	kfree(space->context_buf);
 	kfree(space->session_buf);
 }