Message ID | 20220630213733.17689-3-bvanassche@acm.org |
---|---|
State | New |
Headers | show |
Series | Call blk_mq_free_tag_set() earlier | expand |
On 6/30/22 4:37 PM, Bart Van Assche wrote: > struct scsi_device *sdev; > @@ -1970,8 +1980,21 @@ void scsi_forget_host(struct Scsi_Host *shost) > restart: > spin_lock_irq(shost->host_lock); > list_for_each_entry(sdev, &shost->__devices, siblings) { > - if (sdev->sdev_state == SDEV_DEL) > + if (sdev->sdev_state == SDEV_DEL && > + blk_queue_dead(sdev->request_queue)) { > continue; > + } > + if (sdev->sdev_state == SDEV_DEL) { > + get_device(&sdev->sdev_gendev); > + spin_unlock_irq(shost->host_lock); > + > + while (!blk_queue_dead(sdev->request_queue)) > + msleep(10); > + > + spin_lock_irq(shost->host_lock); > + put_device(&sdev->sdev_gendev); > + goto restart; > + } > spin_unlock_irq(shost->host_lock); > __scsi_remove_device(sdev); > goto restart; Are there 2 ways to hit your issue? 1. Normal case. srp_remove_one frees srp_device. Then all refs to host are dropped and we call srp_exit_cmd_priv which accesses the freed srp_device? You don't need the above patch for this case right. 2. Are you hitting a race? Something did a scsi_remove_device. It set the device state to SDEV_DEL. It's stuck in blk_cleanup_queue blk_freeze_queue. Now we do the above code. Without your patch we just move on by that device. Commands could still be accessed. With your patch we wait for for that other thread to complete the device destruction. Could you also solve both issues by adding a scsi_host_template scsi_host release function that's called from scsi_host_dev_release. A new srp_host_release would free structs like the srp device from there. Or would we still have an issue for #2 where we just don't know how far blk_freeze_queue has got so commands could still be hitting our queue_rq function when we are doing scsi_host_dev_release? I like how your patch makes it so we know after scsi_remove_host has returned that the device is really gone. Could we have a similar race as in #2 with someone doing a scsi_remove_device and transport doing scsi_remove_target? We would not hit the same use after free from the tag set exit_cmd_priv being called. But, for example, if we wanted to free some transport level resources that running commands reference after scsi_target_remove could we hit a use after free? If so maybe we want to move this wait/check to __scsi_remove_device?
On 7/1/22 09:25, Mike Christie wrote: > On 6/30/22 4:37 PM, Bart Van Assche wrote: >> struct scsi_device *sdev; >> @@ -1970,8 +1980,21 @@ void scsi_forget_host(struct Scsi_Host *shost) >> restart: >> spin_lock_irq(shost->host_lock); >> list_for_each_entry(sdev, &shost->__devices, siblings) { >> - if (sdev->sdev_state == SDEV_DEL) >> + if (sdev->sdev_state == SDEV_DEL && >> + blk_queue_dead(sdev->request_queue)) { >> continue; >> + } >> + if (sdev->sdev_state == SDEV_DEL) { >> + get_device(&sdev->sdev_gendev); >> + spin_unlock_irq(shost->host_lock); >> + >> + while (!blk_queue_dead(sdev->request_queue)) >> + msleep(10); >> + >> + spin_lock_irq(shost->host_lock); >> + put_device(&sdev->sdev_gendev); >> + goto restart; >> + } >> spin_unlock_irq(shost->host_lock); >> __scsi_remove_device(sdev); >> goto restart; > > Are there 2 ways to hit your issue? > > 1. Normal case. srp_remove_one frees srp_device. Then all refs > to host are dropped and we call srp_exit_cmd_priv which accesses > the freed srp_device? > > You don't need the above patch for this case right. > > 2. Are you hitting a race? Something did a scsi_remove_device. It > set the device state to SDEV_DEL. It's stuck in blk_cleanup_queue > blk_freeze_queue. Now we do the above code. Without your patch > we just move on by that device. Commands could still be accessed. > With your patch we wait for for that other thread to complete the > device destruction. > > > Could you also solve both issues by adding a scsi_host_template > scsi_host release function that's called from scsi_host_dev_release. A > new srp_host_release would free structs like the srp device from there. > Or would we still have an issue for #2 where we just don't know how > far blk_freeze_queue has got so commands could still be hitting our > queue_rq function when we are doing scsi_host_dev_release? > > I like how your patch makes it so we know after scsi_remove_host > has returned that the device is really gone. Could we have a similar > race as in #2 with someone doing a scsi_remove_device and transport > doing scsi_remove_target? > > We would not hit the same use after free from the tag set exit_cmd_priv > being called. But, for example, if we wanted to free some transport level > resources that running commands reference after scsi_target_remove could > we hit a use after free? If so maybe we want to move this wait/check to > __scsi_remove_device? Hi Mike, Although I'm not sure that I completely understood the above: my goal with this patch is to make sure that all activity on the host tag set has stopped by the time scsi_forget_host() returns. I do not only want to cover the case where scsi_remove_host() is called by the ib_srp driver but also the case where a SCSI device that was created by the ib_srp driver is removed before scsi_remove_host() is called. Users can trigger this scenario by writing into /sys/class/scsi_device/*/*/delete. Adding a new callback into scsi_host_dev_release() would not help because the scsi_host_dev_release() call may happen long after scsi_forget_host() returned. Does this answer your question? Bart.
On Thu, Jun 30, 2022 at 02:37:32PM -0700, Bart Van Assche wrote: > Prepare for freeing the host tag set earlier by making scsi_forget_host() > wait until all activity on the host tag set has stopped. I think it is correct to free the host tag during removing host. > > Cc: Christoph Hellwig <hch@lst.de> > Cc: Ming Lei <ming.lei@redhat.com> > Cc: Hannes Reinecke <hare@suse.de> > Cc: John Garry <john.garry@huawei.com> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/scsi/scsi_scan.c | 25 ++++++++++++++++++++++++- > 1 file changed, 24 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c > index 5c3bb4ceeac3..c8331ccdde95 100644 > --- a/drivers/scsi/scsi_scan.c > +++ b/drivers/scsi/scsi_scan.c > @@ -1961,6 +1961,16 @@ void scsi_scan_host(struct Scsi_Host *shost) > } > EXPORT_SYMBOL(scsi_scan_host); > > +/** > + * scsi_forget_host() - Remove all SCSI devices from a host. > + * @shost: SCSI host to remove devices from. > + * > + * Removes all SCSI devices that have not yet been removed. For the SCSI devices > + * for which removal started before scsi_forget_host(), wait until the > + * associated request queue has reached the "dead" state. In that state it is > + * guaranteed that no new requests will be allocated and also that no requests > + * are in progress anymore. > + */ > void scsi_forget_host(struct Scsi_Host *shost) > { > struct scsi_device *sdev; > @@ -1970,8 +1980,21 @@ void scsi_forget_host(struct Scsi_Host *shost) > restart: > spin_lock_irq(shost->host_lock); > list_for_each_entry(sdev, &shost->__devices, siblings) { > - if (sdev->sdev_state == SDEV_DEL) > + if (sdev->sdev_state == SDEV_DEL && > + blk_queue_dead(sdev->request_queue)) { > continue; > + } > + if (sdev->sdev_state == SDEV_DEL) { > + get_device(&sdev->sdev_gendev); > + spin_unlock_irq(shost->host_lock); > + > + while (!blk_queue_dead(sdev->request_queue)) > + msleep(10); Thinking of further, this report on UAF on SRP resource and Changhui's report on UAF of host->hostt should belong to same kind of issue: 1) after scsi_remove_host() returns, either the HBA driver module can be unloaded and hostt can't be used, or any HBA resources can be freed, both may cause UAF in scsi_mq_exit_request, so moving freeing host tagset to scsi_remove_host() is correct. static void scsi_mq_exit_request() { ... if (shost->hostt->exit_cmd_priv) shost->hostt->exit_cmd_priv(shost, cmd); } 2) checking request queue dead here isn't good, which not only relies on block layer implementation detail, but also can't avoid UAF on ->hostt, I'd suggest to fix them all. The attached patch may drain all sdevs, which can replace the 2nd patch if you don't mind, but the 3rd patch is still needed: diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index e1cb187402fd..6445718c2b18 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -189,6 +189,14 @@ void scsi_remove_host(struct Scsi_Host *shost) transport_unregister_device(&shost->shost_gendev); device_unregister(&shost->shost_dev); device_del(&shost->shost_gendev); + + /* + * Once returning, the scsi LLD module can be unloaded, so we have + * to wait until our descendants are released, otherwise our host + * device reference can be grabbed by them, then use-after-free + * on shost->hostt will be triggered + */ + wait_for_completion(&shost->targets_completion); } EXPORT_SYMBOL(scsi_remove_host); @@ -393,6 +401,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) INIT_LIST_HEAD(&shost->starved_list); init_waitqueue_head(&shost->host_wait); mutex_init(&shost->scan_mutex); + init_completion(&shost->targets_completion); index = ida_simple_get(&host_index_ida, 0, 0, GFP_KERNEL); if (index < 0) { diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 10e5bffc34aa..b6e6354da396 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -545,10 +545,8 @@ EXPORT_SYMBOL(scsi_device_get); */ void scsi_device_put(struct scsi_device *sdev) { - struct module *mod = sdev->host->hostt->module; - + module_put(sdev->host->hostt->module); put_device(&sdev->sdev_gendev); - module_put(mod); } EXPORT_SYMBOL(scsi_device_put); diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 2ef78083f1ef..931333a48372 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -406,9 +406,14 @@ static void scsi_target_destroy(struct scsi_target *starget) static void scsi_target_dev_release(struct device *dev) { struct device *parent = dev->parent; + struct Scsi_Host *shost = dev_to_shost(parent); struct scsi_target *starget = to_scsi_target(dev); kfree(starget); + + if (!atomic_dec_return(&shost->nr_targets)) + complete_all(&shost->targets_completion); + put_device(parent); } @@ -521,6 +526,7 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, starget->state = STARGET_CREATED; starget->scsi_level = SCSI_2; starget->max_target_blocked = SCSI_DEFAULT_TARGET_BLOCKED; + atomic_inc(&shost->nr_targets); retry: spin_lock_irqsave(shost->host_lock, flags); diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 94091d5281ba..28c51ef0ea54 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -449,12 +449,9 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) struct scsi_vpd *vpd_pg80 = NULL, *vpd_pg83 = NULL; struct scsi_vpd *vpd_pg0 = NULL, *vpd_pg89 = NULL; unsigned long flags; - struct module *mod; sdev = container_of(work, struct scsi_device, ew.work); - mod = sdev->host->hostt->module; - scsi_dh_release_device(sdev); parent = sdev->sdev_gendev.parent; @@ -505,17 +502,11 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) if (parent) put_device(parent); - module_put(mod); } static void scsi_device_dev_release(struct device *dev) { struct scsi_device *sdp = to_scsi_device(dev); - - /* Set module pointer as NULL in case of module unloading */ - if (!try_module_get(sdp->host->hostt->module)) - sdp->host->hostt->module = NULL; - execute_in_process_context(scsi_device_dev_release_usercontext, &sdp->ew); } diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index 37718373defc..d0baffd62287 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -710,6 +710,9 @@ struct Scsi_Host { /* ldm bits */ struct device shost_gendev, shost_dev; + atomic_t nr_targets; + struct completion targets_completion; + /* * Points to the transport data (if any) which is allocated * separately Thanks, Ming
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 5c3bb4ceeac3..c8331ccdde95 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1961,6 +1961,16 @@ void scsi_scan_host(struct Scsi_Host *shost) } EXPORT_SYMBOL(scsi_scan_host); +/** + * scsi_forget_host() - Remove all SCSI devices from a host. + * @shost: SCSI host to remove devices from. + * + * Removes all SCSI devices that have not yet been removed. For the SCSI devices + * for which removal started before scsi_forget_host(), wait until the + * associated request queue has reached the "dead" state. In that state it is + * guaranteed that no new requests will be allocated and also that no requests + * are in progress anymore. + */ void scsi_forget_host(struct Scsi_Host *shost) { struct scsi_device *sdev; @@ -1970,8 +1980,21 @@ void scsi_forget_host(struct Scsi_Host *shost) restart: spin_lock_irq(shost->host_lock); list_for_each_entry(sdev, &shost->__devices, siblings) { - if (sdev->sdev_state == SDEV_DEL) + if (sdev->sdev_state == SDEV_DEL && + blk_queue_dead(sdev->request_queue)) { continue; + } + if (sdev->sdev_state == SDEV_DEL) { + get_device(&sdev->sdev_gendev); + spin_unlock_irq(shost->host_lock); + + while (!blk_queue_dead(sdev->request_queue)) + msleep(10); + + spin_lock_irq(shost->host_lock); + put_device(&sdev->sdev_gendev); + goto restart; + } spin_unlock_irq(shost->host_lock); __scsi_remove_device(sdev); goto restart;
Prepare for freeing the host tag set earlier by making scsi_forget_host() wait until all activity on the host tag set has stopped. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/scsi/scsi_scan.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-)