From patchwork Mon Oct 16 02:03:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenchao Hao X-Patchwork-Id: 734207 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D42AC41513 for ; Mon, 16 Oct 2023 02:03:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231403AbjJPCDu (ORCPT ); Sun, 15 Oct 2023 22:03:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231314AbjJPCDn (ORCPT ); Sun, 15 Oct 2023 22:03:43 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 665CE95; Sun, 15 Oct 2023 19:03:40 -0700 (PDT) Received: from kwepemm000012.china.huawei.com (unknown [172.30.72.55]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4S80dl2C9HzvPyk; Mon, 16 Oct 2023 09:58:51 +0800 (CST) Received: from build.huawei.com (10.175.101.6) by kwepemm000012.china.huawei.com (7.193.23.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Mon, 16 Oct 2023 10:03:32 +0800 From: Wenchao Hao To: "James E . J . Bottomley" , "Martin K . Petersen" , CC: , , Wenchao Hao Subject: [PATCH v3 1/4] scsi: core: Add new helper to iterate all devices of host Date: Mon, 16 Oct 2023 10:03:11 +0800 Message-ID: <20231016020314.1269636-2-haowenchao2@huawei.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20231016020314.1269636-1-haowenchao2@huawei.com> References: <20231016020314.1269636-1-haowenchao2@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.101.6] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemm000012.china.huawei.com (7.193.23.142) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org shost_for_each_device() would skip devices which is in SDEV_CANCEL or SDEV_DEL state, for some scenarios, we donot want to skip these devices, so add a new macro shost_for_each_device_include_deleted() to handle it. Following changes are introduced: 1. Rework scsi_device_get(), add new helper __scsi_device_get() which determine if skip deleted scsi_device by parameter "skip_deleted". 2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which is used when calling __scsi_device_get() 3. Update shost_for_each_device() to call __scsi_iterate_devices() with "skip_deleted" true 4. Add new macro shost_for_each_device_include_deleted() which call __scsi_iterate_devices() with "skip_deleted" false Signed-off-by: Wenchao Hao --- drivers/scsi/scsi.c | 46 ++++++++++++++++++++++++++------------ include/scsi/scsi_device.h | 25 ++++++++++++++++++--- 2 files changed, 54 insertions(+), 17 deletions(-) diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index d1c0ba3ef1f5..a9d695841250 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -704,20 +704,18 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable) return 0; } -/** - * scsi_device_get - get an additional reference to a scsi_device +/* + * __scsi_device_get - get an additional reference to a scsi_device * @sdev: device to get a reference to - * - * Description: Gets a reference to the scsi_device and increments the use count - * of the underlying LLDD module. You must hold host_lock of the - * parent Scsi_Host or already have a reference when calling this. - * - * This will fail if a device is deleted or cancelled, or when the LLD module - * is in the process of being unloaded. + * @skip_deleted: when true, would return failed if device is deleted */ -int scsi_device_get(struct scsi_device *sdev) +static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted) { - if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) + /* + * if skip_deleted is true and device is in removing, return failed + */ + if (skip_deleted && + (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)) goto fail; if (!try_module_get(sdev->host->hostt->module)) goto fail; @@ -730,6 +728,22 @@ int scsi_device_get(struct scsi_device *sdev) fail: return -ENXIO; } + +/** + * scsi_device_get - get an additional reference to a scsi_device + * @sdev: device to get a reference to + * + * Description: Gets a reference to the scsi_device and increments the use count + * of the underlying LLDD module. You must hold host_lock of the + * parent Scsi_Host or already have a reference when calling this. + * + * This will fail if a device is deleted or cancelled, or when the LLD module + * is in the process of being unloaded. + */ +int scsi_device_get(struct scsi_device *sdev) +{ + return __scsi_device_get(sdev, 0); +} EXPORT_SYMBOL(scsi_device_get); /** @@ -749,9 +763,13 @@ void scsi_device_put(struct scsi_device *sdev) } EXPORT_SYMBOL(scsi_device_put); -/* helper for shost_for_each_device, see that for documentation */ +/** + * helper for shost_for_each_device, see that for documentation + * @skip_deleted: if true, sdev in progress of removing would be skipped + */ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost, - struct scsi_device *prev) + struct scsi_device *prev, + bool skip_deleted) { struct list_head *list = (prev ? &prev->siblings : &shost->__devices); struct scsi_device *next = NULL; @@ -761,7 +779,7 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost, while (list->next != &shost->__devices) { next = list_entry(list->next, struct scsi_device, siblings); /* skip devices that we can't get a reference to */ - if (!scsi_device_get(next)) + if (!__scsi_device_get(next, skip_deleted)) break; next = NULL; list = list->next; diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index b9230b6add04..ed02755bbc42 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -390,7 +390,8 @@ extern void __starget_for_each_device(struct scsi_target *, void *, /* only exposed to implement shost_for_each_device */ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *, - struct scsi_device *); + struct scsi_device *, + bool); /** * shost_for_each_device - iterate over all devices of a host @@ -400,11 +401,29 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *, * Iterator that returns each device attached to @shost. This loop * takes a reference on each device and releases it at the end. If * you break out of the loop, you must call scsi_device_put(sdev). + * + * Note: this macro would skip sdev which is in progress of removing */ #define shost_for_each_device(sdev, shost) \ - for ((sdev) = __scsi_iterate_devices((shost), NULL); \ + for ((sdev) = __scsi_iterate_devices((shost), NULL, 1); \ + (sdev); \ + (sdev) = __scsi_iterate_devices((shost), (sdev), 1)) + +/* + * shost_for_each_device_include_deleted- iterate over all devices of a host + * @sdev: the &struct scsi_device to use as a cursor + * @shost: the &struct scsi_host to iterate over + * + * Iterator that returns each device attached to @shost. This loop + * takes a reference on each device and releases it at the end. If + * you break out of the loop, you must call scsi_device_put(sdev). + * + * Note: this macro would include sdev which is in progress of removing + */ +#define shost_for_each_device_include_deleted(sdev, shost) \ + for ((sdev) = __scsi_iterate_devices((shost), NULL, 0); \ (sdev); \ - (sdev) = __scsi_iterate_devices((shost), (sdev))) + (sdev) = __scsi_iterate_devices((shost), (sdev), 0)) /** * __shost_for_each_device - iterate over all devices of a host (UNLOCKED) From patchwork Mon Oct 16 02:03:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenchao Hao X-Patchwork-Id: 734209 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 384D5CDB47E for ; Mon, 16 Oct 2023 02:03:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230497AbjJPCDk (ORCPT ); Sun, 15 Oct 2023 22:03:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229639AbjJPCDj (ORCPT ); Sun, 15 Oct 2023 22:03:39 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 374C8C1; Sun, 15 Oct 2023 19:03:37 -0700 (PDT) Received: from kwepemm000012.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4S80g125lDzVlcD; Mon, 16 Oct 2023 09:59:57 +0800 (CST) Received: from build.huawei.com (10.175.101.6) by kwepemm000012.china.huawei.com (7.193.23.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Mon, 16 Oct 2023 10:03:33 +0800 From: Wenchao Hao To: "James E . J . Bottomley" , "Martin K . Petersen" , CC: , , Wenchao Hao Subject: [PATCH v3 2/4] scsi: scsi_error: Fix wrong statistic when print error info Date: Mon, 16 Oct 2023 10:03:12 +0800 Message-ID: <20231016020314.1269636-3-haowenchao2@huawei.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20231016020314.1269636-1-haowenchao2@huawei.com> References: <20231016020314.1269636-1-haowenchao2@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.101.6] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemm000012.china.huawei.com (7.193.23.142) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org shost_for_each_device() would skip devices which is in progress of removing, so commands of these devices would be ignored in scsi_eh_prt_fail_stats(). Fix this issue by using shost_for_each_device_include_deleted() to iterate devices in scsi_eh_prt_fail_stats(). Signed-off-by: Wenchao Hao --- drivers/scsi/scsi_error.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index c67cdcdc3ba8..2550f8cd182a 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -407,7 +407,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost, int cmd_cancel = 0; int devices_failed = 0; - shost_for_each_device(sdev, shost) { + shost_for_each_device_include_deleted(sdev, shost) { list_for_each_entry(scmd, work_q, eh_entry) { if (scmd->device == sdev) { ++total_failures; From patchwork Mon Oct 16 02:03:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenchao Hao X-Patchwork-Id: 734956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E450CDB47E for ; Mon, 16 Oct 2023 02:03:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231345AbjJPCDv (ORCPT ); Sun, 15 Oct 2023 22:03:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231325AbjJPCDn (ORCPT ); Sun, 15 Oct 2023 22:03:43 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64770C1; Sun, 15 Oct 2023 19:03:42 -0700 (PDT) Received: from kwepemm000012.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4S80fc1qVPzNnxT; Mon, 16 Oct 2023 09:59:36 +0800 (CST) Received: from build.huawei.com (10.175.101.6) by kwepemm000012.china.huawei.com (7.193.23.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Mon, 16 Oct 2023 10:03:34 +0800 From: Wenchao Hao To: "James E . J . Bottomley" , "Martin K . Petersen" , CC: , , Wenchao Hao Subject: [PATCH v3 3/4] scsi: scsi_error: Fix device reset is not triggered Date: Mon, 16 Oct 2023 10:03:13 +0800 Message-ID: <20231016020314.1269636-4-haowenchao2@huawei.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20231016020314.1269636-1-haowenchao2@huawei.com> References: <20231016020314.1269636-1-haowenchao2@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.101.6] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemm000012.china.huawei.com (7.193.23.142) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org shost_for_each_device() would skip devices which is in progress of removing, so scsi_try_bus_device_reset() for these devices would be skipped in scsi_eh_bus_device_reset() with following order: T1: T2:scsi_error_handle __scsi_remove_device scsi_device_set_state(sdev, SDEV_DEL) // would skip device with SDEV_DEL state shost_for_each_device() scsi_try_bus_device_reset flush all commands ... releasing and free scsi_device Some drivers like smartpqi only implement eh_device_reset_handler, if device reset is skipped, the commands which had been sent to firmware or devices hardware are not cleared. The error handle would flush all these commands in scsi_unjam_host(). When the commands are finished by hardware, use after free issue is triggered. Fix this issue by using shost_for_each_device_include_deleted() to iterate devices in scsi_eh_bus_device_reset(). Signed-off-by: Wenchao Hao --- drivers/scsi/scsi_error.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 2550f8cd182a..57e3cc556549 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1568,7 +1568,7 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost, struct scsi_device *sdev; enum scsi_disposition rtn; - shost_for_each_device(sdev, shost) { + shost_for_each_device_include_deleted(sdev, shost) { if (scsi_host_eh_past_deadline(shost)) { SCSI_LOG_ERROR_RECOVERY(3, sdev_printk(KERN_INFO, sdev, From patchwork Mon Oct 16 02:03:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenchao Hao X-Patchwork-Id: 734957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D809CDB483 for ; Mon, 16 Oct 2023 02:03:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231240AbjJPCDl (ORCPT ); Sun, 15 Oct 2023 22:03:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230283AbjJPCDj (ORCPT ); Sun, 15 Oct 2023 22:03:39 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C2ADC5; Sun, 15 Oct 2023 19:03:37 -0700 (PDT) Received: from kwepemm000012.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4S80g231KSzVlcN; Mon, 16 Oct 2023 09:59:58 +0800 (CST) Received: from build.huawei.com (10.175.101.6) by kwepemm000012.china.huawei.com (7.193.23.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Mon, 16 Oct 2023 10:03:34 +0800 From: Wenchao Hao To: "James E . J . Bottomley" , "Martin K . Petersen" , CC: , , Wenchao Hao Subject: [PATCH v3 4/4] scsi: scsi_core: Fix IO hang when device removing Date: Mon, 16 Oct 2023 10:03:14 +0800 Message-ID: <20231016020314.1269636-5-haowenchao2@huawei.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20231016020314.1269636-1-haowenchao2@huawei.com> References: <20231016020314.1269636-1-haowenchao2@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.101.6] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemm000012.china.huawei.com (7.193.23.142) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org shost_for_each_device() would skip devices which is in progress of removing, so scsi_run_queue() for these devices would be skipped in scsi_run_host_queues() after blocking hosts' IO. IO hang would be caused if return true when state is SDEV_CANCEL with following order: T1: T2:scsi_error_handler __scsi_remove_device() scsi_device_set_state(sdev, SDEV_CANCEL) ... sd_remove() del_gendisk() blk_mq_freeze_queue_wait() scsi_eh_flush_done_q() scsi_queue_insert(scmd,...) scsi_queue_insert() would not kick device's queue since commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary") After scsi_unjam_host(), the scsi error handler would call scsi_run_host_queues() to trigger run queue for devices, while it would not run queue for devices which is in progress of removing because shost_for_each_device() would skip them. So the requests added to these queues would not be handled any more, and the removing device process would hang too. Fix this issue by using shost_for_each_device_include_deleted() in scsi_run_host_queues() to trigger a run queue for devices in removing. Signed-off-by: Wenchao Hao --- drivers/scsi/scsi_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 195ca80667d0..40f407ffd26f 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -466,7 +466,7 @@ void scsi_run_host_queues(struct Scsi_Host *shost) { struct scsi_device *sdev; - shost_for_each_device(sdev, shost) + shost_for_each_device_include_deleted(sdev, shost) scsi_run_queue(sdev->request_queue); }