mbox series

[V3,0/3] scsi: two fixes in scsi_add_host_with_dma

Message ID 20210531050727.2353973-1-ming.lei@redhat.com
Headers show
Series scsi: two fixes in scsi_add_host_with_dma | expand

Message

Ming Lei May 31, 2021, 5:07 a.m. UTC
Hello Martin,


Fix two memory leaks and one double free issue in alloc/add host
code path.


V3:
	- fix memory leak caused in scsi_host_alloc
	- comment typo suggested by John

V2:
	- add patch 2 for addressing shost leak in case of adding host
	  failure, reported by John Garry.

Ming Lei (3):
  scsi: core: use put_device() to release host
  scsi: core: fix failure handling of scsi_add_host_with_dma
  scsi: core: put ->shost_gendev.parent in failure handling path

 drivers/scsi/hosts.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Hannes Reinecke <hare@suse.de>

Comments

Hannes Reinecke May 31, 2021, 8:23 a.m. UTC | #1
On 5/31/21 9:56 AM, Ming Lei wrote:
> On Mon, May 31, 2021 at 08:28:49AM +0200, Hannes Reinecke wrote:
>> On 5/31/21 7:07 AM, Ming Lei wrote:
>>> get_device(shost->shost_gendev.parent) is called in
>>> scsi_add_host_with_dma(), but its counter pair isn't called in the failure
>>> path, so fix it by calling put_device(shost->shost_gendev.parent) in its
>>> failure path.
>>>
>>> Reported-by: John Garry <john.garry@huawei.com>
>>> Cc: Bart Van Assche <bvanassche@acm.org>
>>> Cc: Hannes Reinecke <hare@suse.de>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>    drivers/scsi/hosts.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>>> index 6cbc3eb16525..6cc43c51b7b3 100644
>>> --- a/drivers/scsi/hosts.c
>>> +++ b/drivers/scsi/hosts.c
>>> @@ -298,6 +298,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
>>>     out_del_dev:
>>>    	device_del(&shost->shost_dev);
>>>     out_del_gendev:
>>> +	put_device(shost->shost_gendev.parent);
>>>    	device_del(&shost->shost_gendev);
>>>     out_disable_runtime_pm:
>>>    	device_disable_async_suspend(&shost->shost_gendev);
>>>
>> This really needs to be folded into the first patch as it's really a bugfix
>> for that.
> 
> All three are bug fixes, and this one may leak .shost_gendev's parent,
> but the 1st one leaks .shost_gendev->kobj.name, so we needn't to fold
> the 3rd into the 1st one.
> 
I beg to disagree.
The first patch removes the 'get_device()' from
scsi_add_host_with_dma(), but does not remove the corresponding 
'put_device()' in the error path.
So the first patch introduces a reference imbalance which is fixed by 
the third patch. Hence my suggestion to merge those two patches.

Cheers,

Hannes
Ming Lei May 31, 2021, 9:17 a.m. UTC | #2
On Mon, May 31, 2021 at 10:23:43AM +0200, Hannes Reinecke wrote:
> On 5/31/21 9:56 AM, Ming Lei wrote:
> > On Mon, May 31, 2021 at 08:28:49AM +0200, Hannes Reinecke wrote:
> > > On 5/31/21 7:07 AM, Ming Lei wrote:
> > > > get_device(shost->shost_gendev.parent) is called in
> > > > scsi_add_host_with_dma(), but its counter pair isn't called in the failure
> > > > path, so fix it by calling put_device(shost->shost_gendev.parent) in its
> > > > failure path.
> > > > 
> > > > Reported-by: John Garry <john.garry@huawei.com>
> > > > Cc: Bart Van Assche <bvanassche@acm.org>
> > > > Cc: Hannes Reinecke <hare@suse.de>
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >    drivers/scsi/hosts.c | 1 +
> > > >    1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> > > > index 6cbc3eb16525..6cc43c51b7b3 100644
> > > > --- a/drivers/scsi/hosts.c
> > > > +++ b/drivers/scsi/hosts.c
> > > > @@ -298,6 +298,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
> > > >     out_del_dev:
> > > >    	device_del(&shost->shost_dev);
> > > >     out_del_gendev:
> > > > +	put_device(shost->shost_gendev.parent);
> > > >    	device_del(&shost->shost_gendev);
> > > >     out_disable_runtime_pm:
> > > >    	device_disable_async_suspend(&shost->shost_gendev);
> > > > 
> > > This really needs to be folded into the first patch as it's really a bugfix
> > > for that.
> > 
> > All three are bug fixes, and this one may leak .shost_gendev's parent,
> > but the 1st one leaks .shost_gendev->kobj.name, so we needn't to fold
> > the 3rd into the 1st one.
> > 
> I beg to disagree.
> The first patch removes the 'get_device()' from
> scsi_add_host_with_dma(), but does not remove the corresponding
> 'put_device()' in the error path.

There isn't such 'put_device' in the error path of scsi_add_host_with_dma(),
is there?

> So the first patch introduces a reference imbalance which is fixed by the
> third patch. Hence my suggestion to merge those two patches.

No, that isn't true, please look at current code of
scsi_add_host_with_dma().


Thanks, 
Ming
John Garry June 1, 2021, 10:34 a.m. UTC | #3
On 31/05/2021 06:07, Ming Lei wrote:
> Hello Martin,

> 

> 

> Fix two memory leaks and one double free issue in alloc/add host

> code path.

> 

> 

> V3:

> 	- fix memory leak caused in scsi_host_alloc

> 	- comment typo suggested by John

> 

> V2:

> 	- add patch 2 for addressing shost leak in case of adding host

> 	  failure, reported by John Garry.

> 

> Ming Lei (3):

>    scsi: core: use put_device() to release host

>    scsi: core: fix failure handling of scsi_add_host_with_dma

>    scsi: core: put ->shost_gendev.parent in failure handling path

> 

>   drivers/scsi/hosts.c | 25 ++++++++++++-------------

>   1 file changed, 12 insertions(+), 13 deletions(-)

> 

> Cc: Bart Van Assche <bvanassche@acm.org>

> Cc: John Garry <john.garry@huawei.com>

> Cc: Hannes Reinecke <hare@suse.de>



I tested this again with the same experiment as before to error in
scsi_add_host_with_dma(), and we don't call scsi_host_dev_release() now. 
The shost_gendev dev refcount is 2 upon exit in scsi_add_host_with_dma().

We don't call scsi_host_cls_release() either, so I guess a ref count is 
leaked for shost_dev - I see its refcount is 1 at exit in 
scsi_add_host_with_dma(). We have the device_initialize(), device_add(), 
device_del() in the alloc and add host functions, but I don't know who 
is responsible for the final "device put".

Thanks,
John
Ming Lei June 1, 2021, 1:11 p.m. UTC | #4
On Tue, Jun 01, 2021 at 11:34:20AM +0100, John Garry wrote:
> On 31/05/2021 06:07, Ming Lei wrote:

> > Hello Martin,

> > 

> > 

> > Fix two memory leaks and one double free issue in alloc/add host

> > code path.

> > 

> > 

> > V3:

> > 	- fix memory leak caused in scsi_host_alloc

> > 	- comment typo suggested by John

> > 

> > V2:

> > 	- add patch 2 for addressing shost leak in case of adding host

> > 	  failure, reported by John Garry.

> > 

> > Ming Lei (3):

> >    scsi: core: use put_device() to release host

> >    scsi: core: fix failure handling of scsi_add_host_with_dma

> >    scsi: core: put ->shost_gendev.parent in failure handling path

> > 

> >   drivers/scsi/hosts.c | 25 ++++++++++++-------------

> >   1 file changed, 12 insertions(+), 13 deletions(-)

> > 

> > Cc: Bart Van Assche <bvanassche@acm.org>

> > Cc: John Garry <john.garry@huawei.com>

> > Cc: Hannes Reinecke <hare@suse.de>

> 

> 

> I tested this again with the same experiment as before to error in

> scsi_add_host_with_dma(), and we don't call scsi_host_dev_release() now. The

> shost_gendev dev refcount is 2 upon exit in scsi_add_host_with_dma().

> 

> We don't call scsi_host_cls_release() either, so I guess a ref count is

> leaked for shost_dev - I see its refcount is 1 at exit in

> scsi_add_host_with_dma(). We have the device_initialize(), device_add(),

> device_del() in the alloc and add host functions, but I don't know who is

> responsible for the final "device put".


Hammm, we still need to put ->shost_dev before returning the error, and the
following delta patch can fix the issue, and it should have been wrapped
into the 1st one.

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 22a58e453a0c..532165462a42 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 	pm_runtime_set_suspended(&shost->shost_gendev);
 	pm_runtime_put_noidle(&shost->shost_gendev);
  fail:
+	/* drop ref of ->shost_dev so that caller can release this host */
+	put_device(&shost->shost_dev);
 	return error;
 }
 EXPORT_SYMBOL(scsi_add_host_with_dma);


Thanks,
Ming
John Garry June 1, 2021, 3:07 p.m. UTC | #5
On 01/06/2021 14:11, Ming Lei wrote:
>> We don't call scsi_host_cls_release() either, so I guess a ref count is

>> leaked for shost_dev - I see its refcount is 1 at exit in

>> scsi_add_host_with_dma(). We have the device_initialize(), device_add(),

>> device_del() in the alloc and add host functions, but I don't know who is

>> responsible for the final "device put".

> Hammm, we still need to put ->shost_dev before returning the error, and the

> following delta patch can fix the issue, and it should have been wrapped

> into the 1st one.

> 

> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c

> index 22a58e453a0c..532165462a42 100644

> --- a/drivers/scsi/hosts.c

> +++ b/drivers/scsi/hosts.c

> @@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,

>   	pm_runtime_set_suspended(&shost->shost_gendev);

>   	pm_runtime_put_noidle(&shost->shost_gendev);

>    fail:

> +	/* drop ref of ->shost_dev so that caller can release this host */

> +	put_device(&shost->shost_dev);

>   	return error;

>   }

>   EXPORT_SYMBOL(scsi_add_host_with_dma);


That looks better now.

And we can see the equivalent on the normal removal path in 
scsi_remove_host() -> device_unregister(&shost->shost_dev), which does a 
device_del()+put_device().

So could we actually just have:
out_del_dev:
	unregister_dev(&shost->shost_dev)

I am not sure if we are required to keep that shost_dev reference all 
the way until the exit, as you do.

Thanks,
John
Ming Lei June 2, 2021, 1:55 a.m. UTC | #6
On Tue, Jun 01, 2021 at 04:07:05PM +0100, John Garry wrote:
> On 01/06/2021 14:11, Ming Lei wrote:

> > > We don't call scsi_host_cls_release() either, so I guess a ref count is

> > > leaked for shost_dev - I see its refcount is 1 at exit in

> > > scsi_add_host_with_dma(). We have the device_initialize(), device_add(),

> > > device_del() in the alloc and add host functions, but I don't know who is

> > > responsible for the final "device put".

> > Hammm, we still need to put ->shost_dev before returning the error, and the

> > following delta patch can fix the issue, and it should have been wrapped

> > into the 1st one.

> > 

> > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c

> > index 22a58e453a0c..532165462a42 100644

> > --- a/drivers/scsi/hosts.c

> > +++ b/drivers/scsi/hosts.c

> > @@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,

> >   	pm_runtime_set_suspended(&shost->shost_gendev);

> >   	pm_runtime_put_noidle(&shost->shost_gendev);

> >    fail:

> > +	/* drop ref of ->shost_dev so that caller can release this host */

> > +	put_device(&shost->shost_dev);

> >   	return error;

> >   }

> >   EXPORT_SYMBOL(scsi_add_host_with_dma);

> 

> That looks better now.

> 

> And we can see the equivalent on the normal removal path in

> scsi_remove_host() -> device_unregister(&shost->shost_dev), which does a

> device_del()+put_device().

> 

> So could we actually just have:

> out_del_dev:

> 	unregister_dev(&shost->shost_dev)

> 


No, we still have to call put_device(&shost->shost_dev) only in case of
failure before adding &shost->shost_dev.

> I am not sure if we are required to keep that shost_dev reference all the

> way until the exit, as you do.


That has been done in this way, the problem is that both .shost_dev and
.shost_gendev share same lifetime and memory(same struct Scsi_Host instance),
this kind of pattern isn't one usual driver core use case, and we have to
handle it carefully.


Thanks,
Ming