Message ID | 20220707182122.3797-2-bvanassche@acm.org |
---|---|
State | New |
Headers | show |
Series | Call blk_mq_free_tag_set() earlier | expand |
On 7/7/22 1:21 PM, Bart Van Assche wrote: > From: Ming Lei <ming.lei@redhat.com> > > Fix the race conditions between SCSI LLD kernel module unloading and SCSI > device and target removal by making sure that SCSI hosts are destroyed after > all associated target and device objects have been freed. > > Cc: Christoph Hellwig <hch@lst.de> > Cc: Ming Lei <ming.lei@redhat.com> > Cc: Hannes Reinecke <hare@suse.de> > Cc: John Garry <john.garry@huawei.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > [ bvanassche: Reworked Ming's patch ] > --- > drivers/scsi/hosts.c | 9 +++++++++ > drivers/scsi/scsi.c | 9 ++++++--- > drivers/scsi/scsi_scan.c | 7 +++++++ > drivers/scsi/scsi_sysfs.c | 9 --------- > include/scsi/scsi_host.h | 3 +++ > 5 files changed, 25 insertions(+), 12 deletions(-) > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c > index ef6c0e37acce..e0a56a8f1f74 100644 > --- a/drivers/scsi/hosts.c > +++ b/drivers/scsi/hosts.c > @@ -168,6 +168,7 @@ void scsi_remove_host(struct Scsi_Host *shost) > > mutex_lock(&shost->scan_mutex); > spin_lock_irqsave(shost->host_lock, flags); > + /* Prevent that new SCSI targets or SCSI devices are added. */ > if (scsi_host_set_state(shost, SHOST_CANCEL)) > if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) { > spin_unlock_irqrestore(shost->host_lock, flags); > @@ -190,6 +191,13 @@ void scsi_remove_host(struct Scsi_Host *shost) > transport_unregister_device(&shost->shost_gendev); > device_unregister(&shost->shost_dev); > device_del(&shost->shost_gendev); > + > + /* > + * After scsi_remove_host() has returned the scsi LLD module can be > + * unloaded and/or the host resources can be released. Hence wait until > + * the dependent SCSI targets and devices are gone before returning. > + */ > + wait_event(shost->targets_wq, atomic_read(&shost->target_count) == 0); Do we still have a possible use after free at the target removal level? If you have a driver supports multiple targets and target removal (any of he FC ones, HW iscsi, etc), then you can still hit: 1. thread1 does sysfs device delete. It's now waiting in blk_cleanup_queue which is waiting on a cmd that has the SCSI error handler running on it or for whatever reason. 2. thread2 decides to delete the target (dev loss tmo or user request). That hits __scsi_remove_device for the device in #1 and sees it's in SDEV_DEL state so it returns. 3. scsi_remove_target returns. The transport/driver then frees it's rport/session for that target. 4. The thread1 in then makes progress in the EH callback and wants to reference the rport/session struct we deleted in #3. The drivers want to know that after scsi_remove_target has returned that nothing is using devices under it similar to the scsi_remove_host case. Every scsi_device has a scsi_target as a parent (scsi_device -> scsi_target) or grandparent (scsi_device -> transport class struct like rport/session -> scsi_target) now right. I was thinking maybe like 20 years ago when scsi_forget_host was made we didn't? If so, could we move what are doing in this patch down a level? Put the wait in scsi_remove_target and wake in scsi_target_dev_release. Instead of a target_count you have a scsi_target sdev_count. scsi_forget_host would then need to loop over the targets and do scsi_target_remove on them instead of doing it at the scsi_device level.
On 7/9/22 08:57, Mike Christie wrote: > If so, could we move what are doing in this patch down a level? Put the wait in > scsi_remove_target and wake in scsi_target_dev_release. Instead of a target_count > you have a scsi_target sdev_count. > > scsi_forget_host would then need to loop over the targets and do > scsi_target_remove on them instead of doing it at the scsi_device level. Hi Mike, Thanks for having taken a look. Could the approach outlined above have user-visible side effects? A different approach has been selected for v4 of this patch series. Further feedback is welcome. Thanks, Bart.
On 7/12/22 5:29 PM, Bart Van Assche wrote: > On 7/9/22 08:57, Mike Christie wrote: >> If so, could we move what are doing in this patch down a level? Put the wait in >> scsi_remove_target and wake in scsi_target_dev_release. Instead of a target_count >> you have a scsi_target sdev_count. >> >> scsi_forget_host would then need to loop over the targets and do >> scsi_target_remove on them instead of doing it at the scsi_device level. > > Hi Mike, > > Thanks for having taken a look. > > Could the approach outlined above have user-visible side effects? > What kind of scenario are you thinking about? I think we would have similar behavior today for the target behavior: 1. With the current code, if there is no sysfs deletion going on and scsi_remove_target's call to __scsi_remove_device is the one that blocks on blk_cleanup_queue then the target removal would wait. 2. With the current code we have: 1. syfs deletion runs scsi_remove_target -> scsi_remove_device and that takes the scan_mutex. 2. scsi_remove_target passed the SDEV_DEL check and is calling scsi_remove_device which is waiting on the scan_mutex. So here scsi_remove_target waits for the deletion similar to what I described above. My suggestion just handles the case where 1. syfs deletion runs scsi_remove_target -> scsi_remove_device and that takes the scan_mutex. It then sets the state to SDEV_DEL. 2. scsi_remove_target runs and see the device is in the SDEV_DEL state. It skips the device and then we return from scsi_remove_target without having waited on that device's removal. > A different approach has been selected for v4 of this patch series. Further feedback > is welcome. > > Thanks, > > Bart.
On 7/14/22 10:32 AM, Mike Christie wrote: > On 7/12/22 5:29 PM, Bart Van Assche wrote: >> On 7/9/22 08:57, Mike Christie wrote: >>> If so, could we move what are doing in this patch down a level? Put the wait in >>> scsi_remove_target and wake in scsi_target_dev_release. Instead of a target_count >>> you have a scsi_target sdev_count. >>> >>> scsi_forget_host would then need to loop over the targets and do >>> scsi_target_remove on them instead of doing it at the scsi_device level. >> >> Hi Mike, >> >> Thanks for having taken a look. >> >> Could the approach outlined above have user-visible side effects? >> > > What kind of scenario are you thinking about? > > I think we would have similar behavior today for the target behavior: > > 1. With the current code, if there is no sysfs deletion going on and > scsi_remove_target's call to __scsi_remove_device is the one that blocks > on blk_cleanup_queue then the target removal would wait. > > > 2. With the current code we have: > > 1. syfs deletion runs scsi_remove_target -> scsi_remove_device and > that takes the scan_mutex. Meant sdev_store_delete -> scsi_remove_device > 2. scsi_remove_target passed the SDEV_DEL check and is calling Here I did mean scsi_remove_target. It's being called from something like the fc or iscsi layer's transport threads. > scsi_remove_device which is waiting on the scan_mutex. > > So here scsi_remove_target waits for the deletion similar to what I described > above. > > My suggestion just handles the case where > > 1. syfs deletion runs scsi_remove_target -> scsi_remove_device and > that takes the scan_mutex. Meant sdev_store_delete -> scsi_remove_device > > It then sets the state to SDEV_DEL. > > 2. scsi_remove_target runs and see the device is in the SDEV_DEL state. Here I did mean scsi_remove_target. It's being called from something like the fc or iscsi layer's transport threads. > It skips the device and then we return from scsi_remove_target without > having waited on that device's removal. > > > > >> A different approach has been selected for v4 of this patch series. Further feedback >> is welcome. >> >> Thanks, >> >> Bart. >
On 7/9/22 08:57, Mike Christie wrote: > If so, could we move what are doing in this patch down a level? Put the wait in > scsi_remove_target and wake in scsi_target_dev_release. Instead of a target_count > you have a scsi_target sdev_count. > > scsi_forget_host would then need to loop over the targets and do > scsi_target_remove on them instead of doing it at the scsi_device level. I'm not sure how to implement the above since scsi_remove_target() skips targets that have one of the states STARGET_DEL, STARGET_REMOVE or STARGET_CREATED_REMOVE? Thanks, Bart.
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index ef6c0e37acce..e0a56a8f1f74 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -168,6 +168,7 @@ void scsi_remove_host(struct Scsi_Host *shost) mutex_lock(&shost->scan_mutex); spin_lock_irqsave(shost->host_lock, flags); + /* Prevent that new SCSI targets or SCSI devices are added. */ if (scsi_host_set_state(shost, SHOST_CANCEL)) if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) { spin_unlock_irqrestore(shost->host_lock, flags); @@ -190,6 +191,13 @@ void scsi_remove_host(struct Scsi_Host *shost) transport_unregister_device(&shost->shost_gendev); device_unregister(&shost->shost_dev); device_del(&shost->shost_gendev); + + /* + * After scsi_remove_host() has returned the scsi LLD module can be + * unloaded and/or the host resources can be released. Hence wait until + * the dependent SCSI targets and devices are gone before returning. + */ + wait_event(shost->targets_wq, atomic_read(&shost->target_count) == 0); } EXPORT_SYMBOL(scsi_remove_host); @@ -394,6 +402,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) INIT_LIST_HEAD(&shost->starved_list); init_waitqueue_head(&shost->host_wait); mutex_init(&shost->scan_mutex); + init_waitqueue_head(&shost->targets_wq); index = ida_alloc(&host_index_ida, GFP_KERNEL); if (index < 0) { diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index c59eac7a32f2..086ec5b5862d 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -586,10 +586,13 @@ EXPORT_SYMBOL(scsi_device_get); */ void scsi_device_put(struct scsi_device *sdev) { - struct module *mod = sdev->host->hostt->module; - + /* + * Decreasing the module reference count before the device reference + * count is safe since scsi_remove_host() only returns after all + * devices have been removed. + */ + module_put(sdev->host->hostt->module); put_device(&sdev->sdev_gendev); - module_put(mod); } EXPORT_SYMBOL(scsi_device_put); diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 91ac901a6682..00c5754f0081 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -406,9 +406,14 @@ static void scsi_target_destroy(struct scsi_target *starget) static void scsi_target_dev_release(struct device *dev) { struct device *parent = dev->parent; + struct Scsi_Host *shost = dev_to_shost(parent); struct scsi_target *starget = to_scsi_target(dev); kfree(starget); + + if (atomic_dec_return(&shost->target_count) == 0) + wake_up(&shost->targets_wq); + put_device(parent); } @@ -521,6 +526,8 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, starget->state = STARGET_CREATED; starget->scsi_level = SCSI_2; starget->max_target_blocked = SCSI_DEFAULT_TARGET_BLOCKED; + atomic_inc(&shost->target_count); + retry: spin_lock_irqsave(shost->host_lock, flags); diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 43949798a2e4..72bf5131e9d8 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -450,12 +450,9 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) struct scsi_vpd *vpd_pg0 = NULL, *vpd_pg89 = NULL; struct scsi_vpd *vpd_pgb0 = NULL, *vpd_pgb1 = NULL, *vpd_pgb2 = NULL; unsigned long flags; - struct module *mod; sdev = container_of(work, struct scsi_device, ew.work); - mod = sdev->host->hostt->module; - scsi_dh_release_device(sdev); parent = sdev->sdev_gendev.parent; @@ -518,17 +515,11 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) if (parent) put_device(parent); - module_put(mod); } static void scsi_device_dev_release(struct device *dev) { struct scsi_device *sdp = to_scsi_device(dev); - - /* Set module pointer as NULL in case of module unloading */ - if (!try_module_get(sdp->host->hostt->module)) - sdp->host->hostt->module = NULL; - execute_in_process_context(scsi_device_dev_release_usercontext, &sdp->ew); } diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index 667d889b92b5..339f975d356e 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -689,6 +689,9 @@ struct Scsi_Host { /* ldm bits */ struct device shost_gendev, shost_dev; + atomic_t target_count; + wait_queue_head_t targets_wq; + /* * Points to the transport data (if any) which is allocated * separately