Message ID | 20210531050727.2353973-1-ming.lei@redhat.com |
---|---|
Headers | show |
Series | scsi: two fixes in scsi_add_host_with_dma | expand |
On 5/31/21 9:56 AM, Ming Lei wrote: > On Mon, May 31, 2021 at 08:28:49AM +0200, Hannes Reinecke wrote: >> On 5/31/21 7:07 AM, Ming Lei wrote: >>> get_device(shost->shost_gendev.parent) is called in >>> scsi_add_host_with_dma(), but its counter pair isn't called in the failure >>> path, so fix it by calling put_device(shost->shost_gendev.parent) in its >>> failure path. >>> >>> Reported-by: John Garry <john.garry@huawei.com> >>> Cc: Bart Van Assche <bvanassche@acm.org> >>> Cc: Hannes Reinecke <hare@suse.de> >>> Signed-off-by: Ming Lei <ming.lei@redhat.com> >>> --- >>> drivers/scsi/hosts.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c >>> index 6cbc3eb16525..6cc43c51b7b3 100644 >>> --- a/drivers/scsi/hosts.c >>> +++ b/drivers/scsi/hosts.c >>> @@ -298,6 +298,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, >>> out_del_dev: >>> device_del(&shost->shost_dev); >>> out_del_gendev: >>> + put_device(shost->shost_gendev.parent); >>> device_del(&shost->shost_gendev); >>> out_disable_runtime_pm: >>> device_disable_async_suspend(&shost->shost_gendev); >>> >> This really needs to be folded into the first patch as it's really a bugfix >> for that. > > All three are bug fixes, and this one may leak .shost_gendev's parent, > but the 1st one leaks .shost_gendev->kobj.name, so we needn't to fold > the 3rd into the 1st one. > I beg to disagree. The first patch removes the 'get_device()' from scsi_add_host_with_dma(), but does not remove the corresponding 'put_device()' in the error path. So the first patch introduces a reference imbalance which is fixed by the third patch. Hence my suggestion to merge those two patches. Cheers, Hannes
On Mon, May 31, 2021 at 10:23:43AM +0200, Hannes Reinecke wrote: > On 5/31/21 9:56 AM, Ming Lei wrote: > > On Mon, May 31, 2021 at 08:28:49AM +0200, Hannes Reinecke wrote: > > > On 5/31/21 7:07 AM, Ming Lei wrote: > > > > get_device(shost->shost_gendev.parent) is called in > > > > scsi_add_host_with_dma(), but its counter pair isn't called in the failure > > > > path, so fix it by calling put_device(shost->shost_gendev.parent) in its > > > > failure path. > > > > > > > > Reported-by: John Garry <john.garry@huawei.com> > > > > Cc: Bart Van Assche <bvanassche@acm.org> > > > > Cc: Hannes Reinecke <hare@suse.de> > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > > --- > > > > drivers/scsi/hosts.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c > > > > index 6cbc3eb16525..6cc43c51b7b3 100644 > > > > --- a/drivers/scsi/hosts.c > > > > +++ b/drivers/scsi/hosts.c > > > > @@ -298,6 +298,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, > > > > out_del_dev: > > > > device_del(&shost->shost_dev); > > > > out_del_gendev: > > > > + put_device(shost->shost_gendev.parent); > > > > device_del(&shost->shost_gendev); > > > > out_disable_runtime_pm: > > > > device_disable_async_suspend(&shost->shost_gendev); > > > > > > > This really needs to be folded into the first patch as it's really a bugfix > > > for that. > > > > All three are bug fixes, and this one may leak .shost_gendev's parent, > > but the 1st one leaks .shost_gendev->kobj.name, so we needn't to fold > > the 3rd into the 1st one. > > > I beg to disagree. > The first patch removes the 'get_device()' from > scsi_add_host_with_dma(), but does not remove the corresponding > 'put_device()' in the error path. There isn't such 'put_device' in the error path of scsi_add_host_with_dma(), is there? > So the first patch introduces a reference imbalance which is fixed by the > third patch. Hence my suggestion to merge those two patches. No, that isn't true, please look at current code of scsi_add_host_with_dma(). Thanks, Ming
On 31/05/2021 06:07, Ming Lei wrote: > Hello Martin, > > > Fix two memory leaks and one double free issue in alloc/add host > code path. > > > V3: > - fix memory leak caused in scsi_host_alloc > - comment typo suggested by John > > V2: > - add patch 2 for addressing shost leak in case of adding host > failure, reported by John Garry. > > Ming Lei (3): > scsi: core: use put_device() to release host > scsi: core: fix failure handling of scsi_add_host_with_dma > scsi: core: put ->shost_gendev.parent in failure handling path > > drivers/scsi/hosts.c | 25 ++++++++++++------------- > 1 file changed, 12 insertions(+), 13 deletions(-) > > Cc: Bart Van Assche <bvanassche@acm.org> > Cc: John Garry <john.garry@huawei.com> > Cc: Hannes Reinecke <hare@suse.de> I tested this again with the same experiment as before to error in scsi_add_host_with_dma(), and we don't call scsi_host_dev_release() now. The shost_gendev dev refcount is 2 upon exit in scsi_add_host_with_dma(). We don't call scsi_host_cls_release() either, so I guess a ref count is leaked for shost_dev - I see its refcount is 1 at exit in scsi_add_host_with_dma(). We have the device_initialize(), device_add(), device_del() in the alloc and add host functions, but I don't know who is responsible for the final "device put". Thanks, John
On Tue, Jun 01, 2021 at 11:34:20AM +0100, John Garry wrote: > On 31/05/2021 06:07, Ming Lei wrote: > > Hello Martin, > > > > > > Fix two memory leaks and one double free issue in alloc/add host > > code path. > > > > > > V3: > > - fix memory leak caused in scsi_host_alloc > > - comment typo suggested by John > > > > V2: > > - add patch 2 for addressing shost leak in case of adding host > > failure, reported by John Garry. > > > > Ming Lei (3): > > scsi: core: use put_device() to release host > > scsi: core: fix failure handling of scsi_add_host_with_dma > > scsi: core: put ->shost_gendev.parent in failure handling path > > > > drivers/scsi/hosts.c | 25 ++++++++++++------------- > > 1 file changed, 12 insertions(+), 13 deletions(-) > > > > Cc: Bart Van Assche <bvanassche@acm.org> > > Cc: John Garry <john.garry@huawei.com> > > Cc: Hannes Reinecke <hare@suse.de> > > > I tested this again with the same experiment as before to error in > scsi_add_host_with_dma(), and we don't call scsi_host_dev_release() now. The > shost_gendev dev refcount is 2 upon exit in scsi_add_host_with_dma(). > > We don't call scsi_host_cls_release() either, so I guess a ref count is > leaked for shost_dev - I see its refcount is 1 at exit in > scsi_add_host_with_dma(). We have the device_initialize(), device_add(), > device_del() in the alloc and add host functions, but I don't know who is > responsible for the final "device put". Hammm, we still need to put ->shost_dev before returning the error, and the following delta patch can fix the issue, and it should have been wrapped into the 1st one. diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 22a58e453a0c..532165462a42 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, pm_runtime_set_suspended(&shost->shost_gendev); pm_runtime_put_noidle(&shost->shost_gendev); fail: + /* drop ref of ->shost_dev so that caller can release this host */ + put_device(&shost->shost_dev); return error; } EXPORT_SYMBOL(scsi_add_host_with_dma); Thanks, Ming
On 01/06/2021 14:11, Ming Lei wrote: >> We don't call scsi_host_cls_release() either, so I guess a ref count is >> leaked for shost_dev - I see its refcount is 1 at exit in >> scsi_add_host_with_dma(). We have the device_initialize(), device_add(), >> device_del() in the alloc and add host functions, but I don't know who is >> responsible for the final "device put". > Hammm, we still need to put ->shost_dev before returning the error, and the > following delta patch can fix the issue, and it should have been wrapped > into the 1st one. > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c > index 22a58e453a0c..532165462a42 100644 > --- a/drivers/scsi/hosts.c > +++ b/drivers/scsi/hosts.c > @@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, > pm_runtime_set_suspended(&shost->shost_gendev); > pm_runtime_put_noidle(&shost->shost_gendev); > fail: > + /* drop ref of ->shost_dev so that caller can release this host */ > + put_device(&shost->shost_dev); > return error; > } > EXPORT_SYMBOL(scsi_add_host_with_dma); That looks better now. And we can see the equivalent on the normal removal path in scsi_remove_host() -> device_unregister(&shost->shost_dev), which does a device_del()+put_device(). So could we actually just have: out_del_dev: unregister_dev(&shost->shost_dev) I am not sure if we are required to keep that shost_dev reference all the way until the exit, as you do. Thanks, John
On Tue, Jun 01, 2021 at 04:07:05PM +0100, John Garry wrote: > On 01/06/2021 14:11, Ming Lei wrote: > > > We don't call scsi_host_cls_release() either, so I guess a ref count is > > > leaked for shost_dev - I see its refcount is 1 at exit in > > > scsi_add_host_with_dma(). We have the device_initialize(), device_add(), > > > device_del() in the alloc and add host functions, but I don't know who is > > > responsible for the final "device put". > > Hammm, we still need to put ->shost_dev before returning the error, and the > > following delta patch can fix the issue, and it should have been wrapped > > into the 1st one. > > > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c > > index 22a58e453a0c..532165462a42 100644 > > --- a/drivers/scsi/hosts.c > > +++ b/drivers/scsi/hosts.c > > @@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, > > pm_runtime_set_suspended(&shost->shost_gendev); > > pm_runtime_put_noidle(&shost->shost_gendev); > > fail: > > + /* drop ref of ->shost_dev so that caller can release this host */ > > + put_device(&shost->shost_dev); > > return error; > > } > > EXPORT_SYMBOL(scsi_add_host_with_dma); > > That looks better now. > > And we can see the equivalent on the normal removal path in > scsi_remove_host() -> device_unregister(&shost->shost_dev), which does a > device_del()+put_device(). > > So could we actually just have: > out_del_dev: > unregister_dev(&shost->shost_dev) > No, we still have to call put_device(&shost->shost_dev) only in case of failure before adding &shost->shost_dev. > I am not sure if we are required to keep that shost_dev reference all the > way until the exit, as you do. That has been done in this way, the problem is that both .shost_dev and .shost_gendev share same lifetime and memory(same struct Scsi_Host instance), this kind of pattern isn't one usual driver core use case, and we have to handle it carefully. Thanks, Ming