Message ID | 20220628175612.2157218-1-bvanassche@acm.org |
---|---|
State | Superseded |
Headers | show |
Series | scsi: core: Call blk_mq_free_tag_set() earlier | expand |
On Tue, Jun 28, 2022 at 10:56:12AM -0700, Bart Van Assche wrote: > There are two .exit_cmd_priv implementations. Both implementations use the > SCSI host pointer. Make sure that the SCSI host pointer is valid when > .exit_cmd_priv is called by moving the .exit_cmd_priv calls from > scsi_device_dev_release() to scsi_forget_host(). Moving > blk_mq_free_tag_set() from scsi_device_dev_release() to scsi_forget_host() > is safe because scsi_forget_host() drains all the request queues that use > the host tag set. This guarantees that no requests are in flight and also > that no new requests will be allocated from the host tag set. Not sure scsi_forget_host really drains all queues since it bypasses sdev which state is SDEV_DEL, so removal for this sdev could be in-progress, not done yet. Thanks, Ming
Hi Bart, I'd rather to understand the issue first. On Wed, Jun 29, 2022 at 02:49:27PM -0700, Bart Van Assche wrote: > On 6/28/22 18:17, Ming Lei wrote: > > On Tue, Jun 28, 2022 at 10:56:12AM -0700, Bart Van Assche wrote: > > > There are two .exit_cmd_priv implementations. Both implementations use the > > > SCSI host pointer. Make sure that the SCSI host pointer is valid when > > > .exit_cmd_priv is called by moving the .exit_cmd_priv calls from > > > scsi_device_dev_release() to scsi_forget_host(). Moving .exit_cmd_priv is actually called from scsi_host_dev_release() instead of scsi_device_dev_release(). Both scsi host pointer and host->shost_data is still valid when calling .exit_cmd_priv via scsi_mq_destroy_tags(). Previously I fixed[1] one similar issue, and that is caused by early module unloading, and anywhere host->hostt is referred, the scsi driver module should be prevented from being unloaded. [1] f2b85040acec scsi: core: Put LLD module refcnt after SCSI device is released Thanks, Ming
On Thu, Jun 30, 2022 at 09:01:39AM +0800, Ming Lei wrote: > Hi Bart, > > I'd rather to understand the issue first. > > On Wed, Jun 29, 2022 at 02:49:27PM -0700, Bart Van Assche wrote: > > On 6/28/22 18:17, Ming Lei wrote: > > > On Tue, Jun 28, 2022 at 10:56:12AM -0700, Bart Van Assche wrote: > > > > There are two .exit_cmd_priv implementations. Both implementations use the > > > > SCSI host pointer. Make sure that the SCSI host pointer is valid when > > > > .exit_cmd_priv is called by moving the .exit_cmd_priv calls from > > > > scsi_device_dev_release() to scsi_forget_host(). Moving > > .exit_cmd_priv is actually called from scsi_host_dev_release() instead > of scsi_device_dev_release(). Both scsi host pointer and host->shost_data is > still valid when calling .exit_cmd_priv via scsi_mq_destroy_tags(). > > Previously I fixed[1] one similar issue, and that is caused by early module > unloading, and anywhere host->hostt is referred, the scsi driver module > should be prevented from being unloaded. > > > [1] f2b85040acec scsi: core: Put LLD module refcnt after SCSI device is released Hi Bart, BTW, Changhui reported one very similar issue when running elevator switch/scsi debug LUN hotplug.
On 6/29/22 18:01, Ming Lei wrote: > .exit_cmd_priv is actually called from scsi_host_dev_release() instead > of scsi_device_dev_release(). Agreed, I will fix this in the patch description. > Both scsi host pointer and host->shost_data is > still valid when calling .exit_cmd_priv via scsi_mq_destroy_tags(). I will change the patch description to make it clear that both exit_cmd_priv implementations use resources associated with the SCSI host. Thanks, Bart.
On 6/30/22 01:57, Ming Lei wrote: > BTW, Changhui reported one very similar issue when running elevator > switch/scsi debug LUN hotplug. > >>From Changhui's report, the issue is basically same with what > f2b85040acec tried to address, but the try_module_get() in > scsi_device_dev_release() may fail, so the scsi_debug module > still can be unloaded. > > The thing is that sdev can be released in async style, and target/host > release is triggered by scsi_device_dev_release_usercontext(). > > So after scsi_host_remove() returns, the shost may still be live from > driver core/sysfs viewpoint, and its release handler can be called > after the LLD module is unloaded. Then this kind of issue is triggered. > > Seems there are at least two approaches for fixing the issue: > > 1) the one suggested in this thread: > - moving any reference to shost->hostt in host release handler into > scsi_host_remove(), and scsi_mq_destroy_tags()/scsi_proc_hostdir_rm(shost->hostt)() > should be covered at least > > 2) wait until all targets are released in scsi_host_remove() > > I am fine with either of the two approaches. > > Bart, please let me know if you are working towards the approach in 1). > If not, I have one patch which implements 2). > > BTW, after either 1) or 2) is done, commit f2b85040acec can be reverted. Hi Ming, All I need is that all activity on the host tag set has stopped by the time scsi_forget_host() returns. I do not need (1) or (2) to fix the bug reported in the description of the patch at the start of this thread. Thanks, Bart.
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index ef6c0e37acce..74bfa187fe19 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -182,6 +182,8 @@ void scsi_remove_host(struct Scsi_Host *shost) mutex_unlock(&shost->scan_mutex); scsi_proc_host_rm(shost); + scsi_mq_destroy_tags(shost); + spin_lock_irqsave(shost->host_lock, flags); if (scsi_host_set_state(shost, SHOST_DEL)) BUG_ON(scsi_host_set_state(shost, SHOST_DEL_RECOVERY)); @@ -295,8 +297,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, return error; /* - * Any host allocation in this function will be freed in - * scsi_host_dev_release(). + * Any resources associated with the SCSI host in this function except + * the tag set will be freed by scsi_host_dev_release(). */ out_del_dev: device_del(&shost->shost_dev); @@ -312,6 +314,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, pm_runtime_disable(&shost->shost_gendev); pm_runtime_set_suspended(&shost->shost_gendev); pm_runtime_put_noidle(&shost->shost_gendev); + scsi_mq_destroy_tags(shost); fail: return error; } @@ -345,9 +348,6 @@ static void scsi_host_dev_release(struct device *dev) kfree(dev_name(&shost->shost_dev)); } - if (shost->tag_set.tags) - scsi_mq_destroy_tags(shost); - kfree(shost->shost_data); ida_free(&host_index_ida, shost->host_no); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 6ffc9e4258a8..1aa1a279f8f3 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1990,7 +1990,10 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) void scsi_mq_destroy_tags(struct Scsi_Host *shost) { + if (!shost->tag_set.tags) + return; blk_mq_free_tag_set(&shost->tag_set); + WARN_ON_ONCE(shost->tag_set.tags); } /**