From patchwork Thu Sep 28 07:35:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenchao Hao X-Patchwork-Id: 727539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BDC7CE7B0C for ; Thu, 28 Sep 2023 07:36:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230250AbjI1Hga (ORCPT ); Thu, 28 Sep 2023 03:36:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230201AbjI1HgX (ORCPT ); Thu, 28 Sep 2023 03:36:23 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AAD599; Thu, 28 Sep 2023 00:36:18 -0700 (PDT) Received: from kwepemm000012.china.huawei.com (unknown [172.30.72.57]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Rx4v30Wx9zMltm; Thu, 28 Sep 2023 15:32:31 +0800 (CST) Received: from build.huawei.com (10.175.101.6) by kwepemm000012.china.huawei.com (7.193.23.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Thu, 28 Sep 2023 15:36:14 +0800 From: Wenchao Hao To: "James E . J . Bottomley" , "Martin K . Petersen" , CC: , , Wenchao Hao Subject: [PATCH v2 1/4] scsi: core: Add new helper to iterate all devices of host Date: Thu, 28 Sep 2023 15:35:40 +0800 Message-ID: <20230928073543.3496394-2-haowenchao2@huawei.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20230928073543.3496394-1-haowenchao2@huawei.com> References: <20230928073543.3496394-1-haowenchao2@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.101.6] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm000012.china.huawei.com (7.193.23.142) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org shost_for_each_device() would skip devices which is in SDEV_CANCEL or SDEV_DEL state, for some scenarios, we donot want to skip these devices, so add a new macro shost_for_each_device_include_deleted() to handle it. Splict scsi_device_get() and new parameter "skip_deleted" is added to __scsi_iterate_devices() to implement this new macro. Signed-off-by: Wenchao Hao --- drivers/scsi/scsi.c | 43 +++++++++++++++++++++++++------------- include/scsi/scsi_device.h | 25 +++++++++++++++++++--- 2 files changed, 50 insertions(+), 18 deletions(-) diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index d0911bc28663..9e31398b6e03 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -704,6 +704,26 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable) return 0; } +static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted) +{ + /* + * if skip_deleted is true and device is in removing, return failed + */ + if (skip_deleted && + (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)) + goto fail; + if (!try_module_get(sdev->host->hostt->module)) + goto fail; + if (!get_device(&sdev->sdev_gendev)) + goto fail_put_module; + return 0; + +fail_put_module: + module_put(sdev->host->hostt->module); +fail: + return -ENXIO; +} + /** * scsi_device_get - get an additional reference to a scsi_device * @sdev: device to get a reference to @@ -717,18 +737,7 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable) */ int scsi_device_get(struct scsi_device *sdev) { - if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) - goto fail; - if (!try_module_get(sdev->host->hostt->module)) - goto fail; - if (!get_device(&sdev->sdev_gendev)) - goto fail_put_module; - return 0; - -fail_put_module: - module_put(sdev->host->hostt->module); -fail: - return -ENXIO; + return __scsi_device_get(sdev, 0); } EXPORT_SYMBOL(scsi_device_get); @@ -749,9 +758,13 @@ void scsi_device_put(struct scsi_device *sdev) } EXPORT_SYMBOL(scsi_device_put); -/* helper for shost_for_each_device, see that for documentation */ +/** + * helper for shost_for_each_device, see that for documentation + * @skip_deleted: if true, sdev in progress of removing would be skipped + */ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost, - struct scsi_device *prev) + struct scsi_device *prev, + bool skip_deleted) { struct list_head *list = (prev ? &prev->siblings : &shost->__devices); struct scsi_device *next = NULL; @@ -761,7 +774,7 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost, while (list->next != &shost->__devices) { next = list_entry(list->next, struct scsi_device, siblings); /* skip devices that we can't get a reference to */ - if (!scsi_device_get(next)) + if (!__scsi_device_get(next, skip_deleted)) break; next = NULL; list = list->next; diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index b9230b6add04..6f8df9b04be3 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -390,7 +390,8 @@ extern void __starget_for_each_device(struct scsi_target *, void *, /* only exposed to implement shost_for_each_device */ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *, - struct scsi_device *); + struct scsi_device *, + bool); /** * shost_for_each_device - iterate over all devices of a host @@ -400,11 +401,29 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *, * Iterator that returns each device attached to @shost. This loop * takes a reference on each device and releases it at the end. If * you break out of the loop, you must call scsi_device_put(sdev). + * + * Note: this macro would skip sdev which is in progress of removing */ #define shost_for_each_device(sdev, shost) \ - for ((sdev) = __scsi_iterate_devices((shost), NULL); \ + for ((sdev) = __scsi_iterate_devices((shost), NULL, 1); \ + (sdev); \ + (sdev) = __scsi_iterate_devices((shost), (sdev), 1)) + +/** + * shost_for_each_device_include_deleted- iterate over all devices of a host + * @sdev: the &struct scsi_device to use as a cursor + * @shost: the &struct scsi_host to iterate over + * + * Iterator that returns each device attached to @shost. This loop + * takes a reference on each device and releases it at the end. If + * you break out of the loop, you must call scsi_device_put(sdev). + * + * Note: this macro would include sdev which is in progress of removing + */ +#define shost_for_each_device_include_deleted(sdev, shost) \ + for ((sdev) = __scsi_iterate_devices((shost), NULL, 0); \ (sdev); \ - (sdev) = __scsi_iterate_devices((shost), (sdev))) + (sdev) = __scsi_iterate_devices((shost), (sdev), 0)) /** * __shost_for_each_device - iterate over all devices of a host (UNLOCKED) From patchwork Thu Sep 28 07:35:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenchao Hao X-Patchwork-Id: 727992 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21851CE7B09 for ; Thu, 28 Sep 2023 07:36:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230171AbjI1HgW (ORCPT ); Thu, 28 Sep 2023 03:36:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229539AbjI1HgV (ORCPT ); Thu, 28 Sep 2023 03:36:21 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6E8692; Thu, 28 Sep 2023 00:36:17 -0700 (PDT) Received: from kwepemm000012.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Rx4tx1hVBzNnp5; Thu, 28 Sep 2023 15:32:25 +0800 (CST) Received: from build.huawei.com (10.175.101.6) by kwepemm000012.china.huawei.com (7.193.23.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Thu, 28 Sep 2023 15:36:14 +0800 From: Wenchao Hao To: "James E . J . Bottomley" , "Martin K . Petersen" , CC: , , Wenchao Hao Subject: [PATCH v2 2/4] scsi: scsi_error: Fix wrong statistic when print error info Date: Thu, 28 Sep 2023 15:35:41 +0800 Message-ID: <20230928073543.3496394-3-haowenchao2@huawei.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20230928073543.3496394-1-haowenchao2@huawei.com> References: <20230928073543.3496394-1-haowenchao2@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.101.6] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm000012.china.huawei.com (7.193.23.142) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org shost_for_each_device() would skip devices which is in progress of removing, so commands of these devices would be ignored in scsi_eh_prt_fail_stats(). Fix this issue by using shost_for_each_device_include_deleted() to iterate devices in scsi_eh_prt_fail_stats(). Signed-off-by: Wenchao Hao --- drivers/scsi/scsi_error.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index c67cdcdc3ba8..2550f8cd182a 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -407,7 +407,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost, int cmd_cancel = 0; int devices_failed = 0; - shost_for_each_device(sdev, shost) { + shost_for_each_device_include_deleted(sdev, shost) { list_for_each_entry(scmd, work_q, eh_entry) { if (scmd->device == sdev) { ++total_failures; From patchwork Thu Sep 28 07:35:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenchao Hao X-Patchwork-Id: 727991 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 561E0CE7B0B for ; Thu, 28 Sep 2023 07:36:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230254AbjI1HgZ (ORCPT ); Thu, 28 Sep 2023 03:36:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229894AbjI1HgW (ORCPT ); Thu, 28 Sep 2023 03:36:22 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 134CA97; Thu, 28 Sep 2023 00:36:18 -0700 (PDT) Received: from kwepemm000012.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Rx4tx5qBgzNntS; Thu, 28 Sep 2023 15:32:25 +0800 (CST) Received: from build.huawei.com (10.175.101.6) by kwepemm000012.china.huawei.com (7.193.23.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Thu, 28 Sep 2023 15:36:15 +0800 From: Wenchao Hao To: "James E . J . Bottomley" , "Martin K . Petersen" , CC: , , Wenchao Hao Subject: [PATCH v2 3/4] scsi: scsi_error: Fix device reset is not triggered Date: Thu, 28 Sep 2023 15:35:42 +0800 Message-ID: <20230928073543.3496394-4-haowenchao2@huawei.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20230928073543.3496394-1-haowenchao2@huawei.com> References: <20230928073543.3496394-1-haowenchao2@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.101.6] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm000012.china.huawei.com (7.193.23.142) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org shost_for_each_device() would skip devices which is in progress of removing, so scsi_try_bus_device_reset() for these devices would be skipped in scsi_eh_bus_device_reset() with following order: T1: T2:scsi_error_handle __scsi_remove_device scsi_device_set_state(sdev, SDEV_DEL) // would skip device with SDEV_DEL state shost_for_each_device() scsi_try_bus_device_reset flush all commands ... releasing and free scsi_device Some drivers like smartpqi only implement eh_device_reset_handler, if device reset is skipped, the commands which had been sent to firmware or devices hardware are not cleared. The error handle would flush all these commands in scsi_unjam_host(). When the commands are finished by hardware, use after free issue is triggered. Fix this issue by using shost_for_each_device_include_deleted() to iterate devices in scsi_eh_bus_device_reset(). Signed-off-by: Wenchao Hao --- drivers/scsi/scsi_error.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 2550f8cd182a..57e3cc556549 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1568,7 +1568,7 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost, struct scsi_device *sdev; enum scsi_disposition rtn; - shost_for_each_device(sdev, shost) { + shost_for_each_device_include_deleted(sdev, shost) { if (scsi_host_eh_past_deadline(shost)) { SCSI_LOG_ERROR_RECOVERY(3, sdev_printk(KERN_INFO, sdev, From patchwork Thu Sep 28 07:35:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenchao Hao X-Patchwork-Id: 727990 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB898CE7B09 for ; Thu, 28 Sep 2023 07:36:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230274AbjI1Hgb (ORCPT ); Thu, 28 Sep 2023 03:36:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229758AbjI1HgX (ORCPT ); Thu, 28 Sep 2023 03:36:23 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF5D89C; Thu, 28 Sep 2023 00:36:18 -0700 (PDT) Received: from kwepemm000012.china.huawei.com (unknown [172.30.72.56]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Rx4wl3XVWzrT3D; Thu, 28 Sep 2023 15:33:59 +0800 (CST) Received: from build.huawei.com (10.175.101.6) by kwepemm000012.china.huawei.com (7.193.23.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Thu, 28 Sep 2023 15:36:15 +0800 From: Wenchao Hao To: "James E . J . Bottomley" , "Martin K . Petersen" , CC: , , Wenchao Hao Subject: [PATCH v2 4/4] scsi: scsi_core: Fix IO hang when device removing Date: Thu, 28 Sep 2023 15:35:43 +0800 Message-ID: <20230928073543.3496394-5-haowenchao2@huawei.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20230928073543.3496394-1-haowenchao2@huawei.com> References: <20230928073543.3496394-1-haowenchao2@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.101.6] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm000012.china.huawei.com (7.193.23.142) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org shost_for_each_device() would skip devices which is in progress of removing, so scsi_run_queue() for these devices would be skipped in scsi_run_host_queues() after blocking hosts' IO. IO hang would be caused if return true when state is SDEV_CANCEL with following order: T1: T2:scsi_error_handler __scsi_remove_device() scsi_device_set_state(sdev, SDEV_CANCEL) ... sd_remove() del_gendisk() blk_mq_freeze_queue_wait() scsi_eh_flush_done_q() scsi_queue_insert(scmd,...) Because scsi_queue_insert() would not kick device's queue after commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary") After scsi_unjam_host(), the scsi error handler would call scsi_run_queue() to trigger run queue for devices, while it would not run queue for devices which is in progress of removing because shost_for_each_device() would skip them. So the requests added to these queues would not be handled any more, and the removing device process would hang too. Fix this issue by using shost_for_each_device_include_deleted() in scsi_run_queue() to trigger a run queue for devices in removing. Signed-off-by: Wenchao Hao --- drivers/scsi/scsi_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index c2f647a7c1b0..34b408d182e2 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -466,7 +466,7 @@ void scsi_run_host_queues(struct Scsi_Host *shost) { struct scsi_device *sdev; - shost_for_each_device(sdev, shost) + shost_for_each_device_include_deleted(sdev, shost) scsi_run_queue(sdev->request_queue); }