From patchwork Wed Feb 7 18:40:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Jeffery X-Patchwork-Id: 770797 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4897383A16 for ; Wed, 7 Feb 2024 18:42:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331348; cv=none; b=CICFTNh3moUZAhkk8KMs3XZtmyQ2do5uFsVGfGdujsxP5tuVKDgYOux3+NN0iTZ/mjZ1FP6ckvyx8ZElmXKjIEWjI23s/Alp+Bh2RB+CCsXICkPRjMcUVnPv7/Ag0FNMA7F8qaV0GYp+JvmqiUMGP4OCJgeSXMsVuYaPx2kmjs4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331348; c=relaxed/simple; bh=ZCRVA2mffIeaQus1dOlcK54spGmEXo1fw7BGH0JkEM4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Y7zBs/FOQP+XN2pAJcGKAlZJr6/M4bJjiZElYYfHokyGUrrhVRoPCILy5dnvJLmFQ7XGo8hu2JYfLcW/CHbXOnaQ//OzqTYJODNcSP6JKdNMGuLLwoNwhyUWk56wU2QkkPrjd0OB+DTyQs8GITfGLR3T5cWmuTuoNFXUFA7Syo0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=aUPpiAD5; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aUPpiAD5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707331344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wxUXCNijSG3JpoO3vioYJ7kD0cZUoEnLjl7wb2EW6H8=; b=aUPpiAD53A9QVjJBP8uADMnz04XbtAAM29VfHnI6+ZXbbfIGBVXHhLlPjULGRLY3m3FKct Op5mV5xynDOMmXROBMmei7tbmusK0jiaJYnhRnupwUvuYv0OllWIJqGsUqMbXmcaQfkLaa VpWG8s0MO3IPbTqWF8VRzYDeuYAmI7I= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-376-l26M5vl5PByd4fVcW48V4g-1; Wed, 07 Feb 2024 13:42:20 -0500 X-MC-Unique: l26M5vl5PByd4fVcW48V4g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2579483B86B; Wed, 7 Feb 2024 18:42:19 +0000 (UTC) Received: from fedora-work.redhat.com (unknown [10.22.32.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9C76C4011FF1; Wed, 7 Feb 2024 18:42:18 +0000 (UTC) From: David Jeffery To: linux-kernel@vger.kernel.org Cc: linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Greg Kroah-Hartman , "Rafael J . Wysocki" , David Jeffery , Laurence Oberman Subject: [RFC PATCH 1/6] minimal async shutdown infrastructure Date: Wed, 7 Feb 2024 13:40:55 -0500 Message-ID: <20240207184100.18066-2-djeffery@redhat.com> In-Reply-To: <20240207184100.18066-1-djeffery@redhat.com> References: <20240207184100.18066-1-djeffery@redhat.com> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 Adds the async_shutdown_start and async_shutdown_end calls to perform async shutdown. Implements a very minimalist method of async shutdown support within device_shutdown(). The device at the head of the shutdown list is checked against a list of devices under async shutdown. If the head is a parent of a device on the async list, all active async shutdown operations are completed before the parent's shutdown call is performed. The number of async operations also has a max limit to prevent the list being checked for a child from getting overly large. Signed-off-by: David Jeffery Tested-by: Laurence Oberman --- drivers/base/core.c | 116 +++++++++++++++++++++++++++++++++- include/linux/device/bus.h | 8 ++- include/linux/device/driver.h | 7 ++ 3 files changed, 127 insertions(+), 4 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 14d46af40f9a..5bc2282c00cd 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -4719,12 +4719,92 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid) } EXPORT_SYMBOL_GPL(device_change_owner); + +#define MAX_ASYNC_SHUTDOWNS 32 +static int async_shutdown_count; +static LIST_HEAD(async_shutdown_list); + +/** + * If a device has a child busy with an async shutdown or there are too many + * async shutdowns active, the device may not be shut down at this time. + */ +static bool may_shutdown_device(struct device *dev) +{ + struct device *tmp; + + if (async_shutdown_count >= MAX_ASYNC_SHUTDOWNS) + return false; + + list_for_each_entry(tmp, &async_shutdown_list, kobj.entry) { + if (tmp->parent == dev) + return false; + } + return true; +} + +/** + * Call and track each async shutdown call + */ +static void async_shutdown_start(struct device *dev, void (*callback) (struct device *)) +{ + if (initcall_debug) + dev_info(dev, "async_shutdown_start\n"); + + (*callback)(dev); + list_add_tail(&dev->kobj.entry, &async_shutdown_list); + async_shutdown_count++; +} + +/** + * Wait for all async shutdown operations currently active to complete + */ +static void wait_for_active_async_shutdown(void) +{ + struct device *dev, *parent; + + while (!list_empty(&async_shutdown_list)) { + dev = list_entry(async_shutdown_list.next, struct device, + kobj.entry); + + parent = dev->parent; + + /* + * Make sure the device is off the list + */ + list_del_init(&dev->kobj.entry); + if (parent) + device_lock(parent); + device_lock(dev); + if (dev->bus && dev->bus->async_shutdown_end) { + if (initcall_debug) + dev_info(dev, + "async_shutdown_end called\n"); + dev->bus->async_shutdown_end(dev); + } else if (dev->driver && dev->driver->async_shutdown_end) { + if (initcall_debug) + dev_info(dev, + "async_shutdown_end called\n"); + dev->driver->async_shutdown_end(dev); + } + device_unlock(dev); + if (parent) + device_unlock(parent); + + put_device(dev); + put_device(parent); + } + if (initcall_debug) + printk(KERN_INFO "device shutdown: waited for %d async shutdown callbacks\n", async_shutdown_count); + async_shutdown_count = 0; +} + /** * device_shutdown - call ->shutdown() on each device to shutdown. */ void device_shutdown(void) { struct device *dev, *parent; + bool async_busy; wait_for_device_probe(); device_block_probing(); @@ -4741,6 +4821,8 @@ void device_shutdown(void) dev = list_entry(devices_kset->list.prev, struct device, kobj.entry); + async_busy = false; + /* * hold reference count of device's parent to * prevent it from being freed because parent's @@ -4748,6 +4830,17 @@ void device_shutdown(void) */ parent = get_device(dev->parent); get_device(dev); + + if (!may_shutdown_device(dev)) { + put_device(dev); + put_device(parent); + + spin_unlock(&devices_kset->list_lock); + wait_for_active_async_shutdown(); + spin_lock(&devices_kset->list_lock); + continue; + } + /* * Make sure the device is off the kset list, in the * event that dev->*->shutdown() doesn't remove it. @@ -4769,26 +4862,43 @@ void device_shutdown(void) dev_info(dev, "shutdown_pre\n"); dev->class->shutdown_pre(dev); } - if (dev->bus && dev->bus->shutdown) { + if (dev->bus && dev->bus->async_shutdown_start) { + async_shutdown_start(dev, dev->bus->async_shutdown_start); + async_busy = true; + } else if (dev->bus && dev->bus->shutdown) { if (initcall_debug) dev_info(dev, "shutdown\n"); dev->bus->shutdown(dev); + } else if (dev->driver && dev->driver->async_shutdown_start) { + async_shutdown_start(dev, dev->driver->async_shutdown_start); + async_busy = true; } else if (dev->driver && dev->driver->shutdown) { if (initcall_debug) dev_info(dev, "shutdown\n"); dev->driver->shutdown(dev); + } else { + if (initcall_debug) + dev_info(dev, "no shutdown callback\n"); } device_unlock(dev); if (parent) device_unlock(parent); - put_device(dev); - put_device(parent); + /* if device has an async shutdown, drop the ref when done */ + if (!async_busy) { + put_device(dev); + put_device(parent); + } spin_lock(&devices_kset->list_lock); } spin_unlock(&devices_kset->list_lock); + /* + * Wait for any async shutdown still running. + */ + if (!list_empty(&async_shutdown_list)) + wait_for_active_async_shutdown(); } /* diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h index 5ef4ec1c36c3..7a4a2ff0bc23 100644 --- a/include/linux/device/bus.h +++ b/include/linux/device/bus.h @@ -48,7 +48,11 @@ struct fwnode_handle; * will never get called until they do. * @remove: Called when a device removed from this bus. * @shutdown: Called at shut-down time to quiesce the device. - * + * @async_shutdown_start: Optional call to support and begin the shutdown + * process on the device in an asynchronous manner. + * @async_shutdown_end: Optional call to complete an asynchronous + * shutdown of the device. Must be provided if a + * sync_shutdown_start call is provided. * @online: Called to put the device back online (after offlining it). * @offline: Called to put the device offline for hot-removal. May fail. * @@ -87,6 +91,8 @@ struct bus_type { void (*sync_state)(struct device *dev); void (*remove)(struct device *dev); void (*shutdown)(struct device *dev); + void (*async_shutdown_start)(struct device *dev); + void (*async_shutdown_end)(struct device *dev); int (*online)(struct device *dev); int (*offline)(struct device *dev); diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index 7738f458995f..af0ad2d3687a 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -71,6 +71,11 @@ enum probe_type { * @remove: Called when the device is removed from the system to * unbind a device from this driver. * @shutdown: Called at shut-down time to quiesce the device. + * @async_shutdown_start: Optional call to support and begin the shutdown + * process on the device in an asynchronous manner. + * @async_shutdown_end: Optional call to complete an asynchronous + * shutdown of the device. Must be provided if a + * sync_shutdown_start call is provided. * @suspend: Called to put the device to sleep mode. Usually to a * low power state. * @resume: Called to bring a device from sleep mode. @@ -110,6 +115,8 @@ struct device_driver { void (*sync_state)(struct device *dev); int (*remove) (struct device *dev); void (*shutdown) (struct device *dev); + void (*async_shutdown_start) (struct device *dev); + void (*async_shutdown_end) (struct device *dev); int (*suspend) (struct device *dev, pm_message_t state); int (*resume) (struct device *dev); const struct attribute_group **groups; From patchwork Wed Feb 7 18:40:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Jeffery X-Patchwork-Id: 771519 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D58F583CD7 for ; Wed, 7 Feb 2024 18:42:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331347; cv=none; b=PVLFVi8GRvYSvEYEQqXcwx/LCmp6zhDFgNI+aoMofBnH/zlzCVZixYgvF5Kkr9m1/e4L0UToEtGkecHf8BEKxDPesO7Obqg4JtFDjSv//pUc0FHUT8wZ/3nq2X8/J0G+QKMUdelTVB0MXNAPeElRhP/b/JdKvfTBQZId50WfymY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331347; c=relaxed/simple; bh=IAD5k7mJ5GTdKJh3kH8fpJh/ezNJOFY0mEoN0yEKfYk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gZTosZzjV0Tu2e4MySM+urxNndnvdHPoWm408++euWZcR4gp5AOfNmX6aSWP/qOLpb13CQdj6exTm1FmlKJH4CmTmYKvT8hF11kLX/plOvZSNplYb3xLPoPYLS4oMCwGOSkCDOtV4IWD9ljvN+GLSq45zHa8VIAAzjavpzIOEX0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=FNO4T+9B; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FNO4T+9B" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707331344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oTFy4nGaIuyAjRSeWWj3icZv9u9DGODaCpUqVFylGh8=; b=FNO4T+9BLt37qyocVqxh7WZpBdhVDEAFm0IXUC8sZJSGsN49OJrj18LRdnCRZWaFnRnLce 13oni/405hzaelV6/W+S6ZqBGVkVGHzQfIlGZrK7TpyKLPD5/Nwg0eaSh4rzoY87nX6xn3 hcavhR8CNChjq56FTvPbqM7kdhjlb5A= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-590-K0PNYfJIP2qWNiPNid8kCQ-1; Wed, 07 Feb 2024 13:42:21 -0500 X-MC-Unique: K0PNYfJIP2qWNiPNid8kCQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D2C5338212C5; Wed, 7 Feb 2024 18:42:20 +0000 (UTC) Received: from fedora-work.redhat.com (unknown [10.22.32.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 66FB2400D6D2; Wed, 7 Feb 2024 18:42:20 +0000 (UTC) From: David Jeffery To: linux-kernel@vger.kernel.org Cc: linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Greg Kroah-Hartman , "Rafael J . Wysocki" , David Jeffery , Laurence Oberman Subject: [RFC PATCH 2/6] Improve ability to perform async shutdown in parallel Date: Wed, 7 Feb 2024 13:40:56 -0500 Message-ID: <20240207184100.18066-3-djeffery@redhat.com> In-Reply-To: <20240207184100.18066-1-djeffery@redhat.com> References: <20240207184100.18066-1-djeffery@redhat.com> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 Expands the async shutdown implementation to allow more cases of parallel async shutdown. A field is added so that a device under async shutdown can mark its parent as busy due to the async shutdown. A busy parent on reaching the head of the shutdown list gets stored and flags its own parent as busy. Once the async shutdown operations are completed, the stored parents are returned to the shutdown list and shut down in an order maintaining their parent-child ordering. Unlike the minimal implementation, this allows more end nodes of the device tree to be under async shutdown in parallel. A cap on the number of async shutdown devices is still enforced, though not required and could be removed if desired. Signed-off-by: David Jeffery Tested-by: Laurence Oberman --- drivers/base/base.h | 1 + drivers/base/core.c | 71 ++++++++++++++++++++++++++++++++------------- 2 files changed, 52 insertions(+), 20 deletions(-) diff --git a/drivers/base/base.h b/drivers/base/base.h index eb4c0ace9242..954008bd39e5 100644 --- a/drivers/base/base.h +++ b/drivers/base/base.h @@ -116,6 +116,7 @@ struct device_private { char *deferred_probe_reason; struct device *device; u8 dead:1; + u8 child_shutdown:1; }; #define to_device_private_parent(obj) \ container_of(obj, struct device_private, knode_parent) diff --git a/drivers/base/core.c b/drivers/base/core.c index 5bc2282c00cd..e88d418bf0fd 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -4720,9 +4720,10 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid) EXPORT_SYMBOL_GPL(device_change_owner); -#define MAX_ASYNC_SHUTDOWNS 32 +#define MAX_ASYNC_SHUTDOWNS 256 static int async_shutdown_count; static LIST_HEAD(async_shutdown_list); +static LIST_HEAD(async_delayed_list); /** * If a device has a child busy with an async shutdown or there are too many @@ -4730,15 +4731,15 @@ static LIST_HEAD(async_shutdown_list); */ static bool may_shutdown_device(struct device *dev) { - struct device *tmp; - - if (async_shutdown_count >= MAX_ASYNC_SHUTDOWNS) + if (dev->p->child_shutdown) { + if (list_empty(&async_shutdown_list)) { + dev_err(dev, "child_shutdown set but no children? Clearing\n"); + dev->p->child_shutdown = 0; + return true; + } return false; - - list_for_each_entry(tmp, &async_shutdown_list, kobj.entry) { - if (tmp->parent == dev) - return false; } + return true; } @@ -4753,6 +4754,9 @@ static void async_shutdown_start(struct device *dev, void (*callback) (struct de (*callback)(dev); list_add_tail(&dev->kobj.entry, &async_shutdown_list); async_shutdown_count++; + + if (dev->parent) + dev->parent->p->child_shutdown = 1; } /** @@ -4760,7 +4764,7 @@ static void async_shutdown_start(struct device *dev, void (*callback) (struct de */ static void wait_for_active_async_shutdown(void) { - struct device *dev, *parent; + struct device *dev, *parent, *tmp; while (!list_empty(&async_shutdown_list)) { dev = list_entry(async_shutdown_list.next, struct device, @@ -4787,15 +4791,29 @@ static void wait_for_active_async_shutdown(void) dev->driver->async_shutdown_end(dev); } device_unlock(dev); - if (parent) - device_unlock(parent); - + if (parent) { + tmp = parent; + do { + tmp->p->child_shutdown = 0; + device_unlock(tmp); + + tmp = tmp->parent; + if (!tmp || !tmp->p->child_shutdown) + break; + device_lock(tmp); + } while (1); + } put_device(dev); put_device(parent); } if (initcall_debug) printk(KERN_INFO "device shutdown: waited for %d async shutdown callbacks\n", async_shutdown_count); + async_shutdown_count = 0; + spin_lock(&devices_kset->list_lock); + list_splice_tail_init(&async_delayed_list, &devices_kset->list); + spin_unlock(&devices_kset->list_lock); + } /** @@ -4810,7 +4828,7 @@ void device_shutdown(void) device_block_probing(); cpufreq_suspend(); - +restart: spin_lock(&devices_kset->list_lock); /* * Walk the devices list backward, shutting down each in turn. @@ -4832,12 +4850,15 @@ void device_shutdown(void) get_device(dev); if (!may_shutdown_device(dev)) { + list_move(&dev->kobj.entry, &async_delayed_list); + if (parent) { + device_lock(parent); + parent->p->child_shutdown = 1; + device_unlock(parent); + } + put_device(dev); put_device(parent); - - spin_unlock(&devices_kset->list_lock); - wait_for_active_async_shutdown(); - spin_lock(&devices_kset->list_lock); continue; } @@ -4863,14 +4884,16 @@ void device_shutdown(void) dev->class->shutdown_pre(dev); } if (dev->bus && dev->bus->async_shutdown_start) { - async_shutdown_start(dev, dev->bus->async_shutdown_start); + async_shutdown_start(dev, + dev->bus->async_shutdown_start); async_busy = true; } else if (dev->bus && dev->bus->shutdown) { if (initcall_debug) dev_info(dev, "shutdown\n"); dev->bus->shutdown(dev); } else if (dev->driver && dev->driver->async_shutdown_start) { - async_shutdown_start(dev, dev->driver->async_shutdown_start); + async_shutdown_start(dev, + dev->driver->async_shutdown_start); async_busy = true; } else if (dev->driver && dev->driver->shutdown) { if (initcall_debug) @@ -4891,14 +4914,22 @@ void device_shutdown(void) put_device(parent); } + if (async_shutdown_count == MAX_ASYNC_SHUTDOWNS) + wait_for_active_async_shutdown(); + spin_lock(&devices_kset->list_lock); } spin_unlock(&devices_kset->list_lock); /* - * Wait for any async shutdown still running. + * Wait for any async shutdown still running, then restart the loop + * if the list is no longer empty from delayed entries returning to + * the list. */ if (!list_empty(&async_shutdown_list)) wait_for_active_async_shutdown(); + + if(!list_empty(&devices_kset->list)) + goto restart; } /* From patchwork Wed Feb 7 18:40:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Jeffery X-Patchwork-Id: 771518 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 236E084FD6 for ; Wed, 7 Feb 2024 18:42:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331348; cv=none; b=HnxnbQcD8XUAkUwCD6zHaXpZOmhXmrEbE1Oj9uPQLFTV1o9KqlX+QrcN0oXugIP78O+PDoNt7ZRYl6qqKk8w8X6GDLoXLM7l6axCWe9Plrw6E87GFnl4k/AjUTkZcMnUAon2MM1qgXA4mQAM2MIt7E09q6Sa6Q/fSLw+CW1yoz4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331348; c=relaxed/simple; bh=Dwy/u3XKKFj/qy3AWoxAcE6kPy3lkPw+HkoBS6KoqLw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mzGO382KTvGf1z3xVnaAsUq5IoD4Q+AMVpf1SIYZuM2SAsSswz9YKu+JP22hyTyt1b+0LtOBcBattXIvQ1KbWsOkMRA0s/PK/DnCVwk2UVPtMulADuCJDnqUNg7SpBBPhfGBULMihgBQS+mBK33T3yuAthTmZ3CVqAVrv06a71c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TH1PICes; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TH1PICes" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707331346; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F7841DLyqRx4+GFENMTDanBPrbVQXHzKvu3LLMhu66w=; b=TH1PICesQPSYDucZ+w+EcAR8JxWxdbesyuuG08NN1Sv7oDOjDaNlrc9vTaaDhbIVMPzjB5 ESg2JCGa0aZSMkMRqugOCCLf/5vzvQeA0Oq1AEeD19RV1CK6+miKqAk907BCl5QDl85vWH 06HKJr9WLmzw+BnMfEJOy5nmmtKC4r8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-223-g0k1TawDMgiM8jXxLRJciw-1; Wed, 07 Feb 2024 13:42:22 -0500 X-MC-Unique: g0k1TawDMgiM8jXxLRJciw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 602F2185A780; Wed, 7 Feb 2024 18:42:22 +0000 (UTC) Received: from fedora-work.redhat.com (unknown [10.22.32.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id E6DB24011FF1; Wed, 7 Feb 2024 18:42:21 +0000 (UTC) From: David Jeffery To: linux-kernel@vger.kernel.org Cc: linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Greg Kroah-Hartman , "Rafael J . Wysocki" , David Jeffery , Laurence Oberman Subject: [RFC PATCH 3/6] pci bus async shutdown support Date: Wed, 7 Feb 2024 13:40:57 -0500 Message-ID: <20240207184100.18066-4-djeffery@redhat.com> In-Reply-To: <20240207184100.18066-1-djeffery@redhat.com> References: <20240207184100.18066-1-djeffery@redhat.com> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 Add async shutdown shutdown fields and Convert pci's shutdown logic into async shutdown calls so that individual pci device drivers can implement async shutdown. Signed-off-by: David Jeffery Tested-by: Laurence Oberman --- drivers/pci/pci-driver.c | 24 ++++++++++++++++++++++-- include/linux/pci.h | 4 ++++ 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 51ec9e7e784f..0ad418905115 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -502,16 +502,28 @@ static void pci_device_remove(struct device *dev) pci_dev_put(pci_dev); } -static void pci_device_shutdown(struct device *dev) +static void pci_device_async_shutdown_start(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); struct pci_driver *drv = pci_dev->driver; pm_runtime_resume(dev); - if (drv && drv->shutdown) + if (drv && drv->async_shutdown_start) + drv->async_shutdown_start(pci_dev); + else if (drv && drv->shutdown) drv->shutdown(pci_dev); +} + +static void pci_device_async_shutdown_end(struct device *dev) +{ + struct pci_dev *pci_dev = to_pci_dev(dev); + struct pci_driver *drv = pci_dev->driver; + + if (drv && drv->async_shutdown_end) + drv->async_shutdown_end(pci_dev); + /* * If this is a kexec reboot, turn off Bus Master bit on the * device to tell it to not continue to do DMA. Don't touch @@ -523,6 +535,12 @@ static void pci_device_shutdown(struct device *dev) pci_clear_master(pci_dev); } +static void pci_device_shutdown(struct device *dev) +{ + pci_device_async_shutdown_start(dev); + pci_device_async_shutdown_end(dev); +} + #ifdef CONFIG_PM_SLEEP /* Auxiliary functions used for system resume */ @@ -1682,6 +1700,8 @@ struct bus_type pci_bus_type = { .probe = pci_device_probe, .remove = pci_device_remove, .shutdown = pci_device_shutdown, + .async_shutdown_start = pci_device_async_shutdown_start, + .async_shutdown_end = pci_device_async_shutdown_end, .dev_groups = pci_dev_groups, .bus_groups = pci_bus_groups, .drv_groups = pci_drv_groups, diff --git a/include/linux/pci.h b/include/linux/pci.h index add9368e6314..6f61325c956a 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -917,6 +917,8 @@ struct module; * Useful for enabling wake-on-lan (NIC) or changing * the power state of a device before reboot. * e.g. drivers/net/e100.c. + * @async_shutdown_start: This starts the asynchronous shutdown + * @async_shutdown_end: This completes the started asynchronous shutdown * @sriov_configure: Optional driver callback to allow configuration of * number of VFs to enable via sysfs "sriov_numvfs" file. * @sriov_set_msix_vec_count: PF Driver callback to change number of MSI-X @@ -947,6 +949,8 @@ struct pci_driver { int (*suspend)(struct pci_dev *dev, pm_message_t state); /* Device suspended */ int (*resume)(struct pci_dev *dev); /* Device woken up */ void (*shutdown)(struct pci_dev *dev); + void (*async_shutdown_start)(struct pci_dev *dev); + void (*async_shutdown_end)(struct pci_dev *dev); int (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */ int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */ u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf); From patchwork Wed Feb 7 18:40:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Jeffery X-Patchwork-Id: 770795 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1A6886137 for ; Wed, 7 Feb 2024 18:42:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331354; cv=none; b=sofepgENfqNKga/rA9WtjW0LxnrB2c2deh+XgGXi6ZJ0s/KiTj+qNPPsfvcQ5itszGrxozhndrnFI0erbVmUIcezWI3EyBKsdMVMKP3YBjEKf8OkLuvOBy+2/07aMxIgZ149eB6maAppMCGSRnwRRT87ujQraYhNLbOzlBn59TM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331354; c=relaxed/simple; bh=UeygM9npBEUq1MKQwn5tUh8m7ES9H0uWtUU1AqB2VWY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CWH/Equ9mRpj5uBQBdBzpYJYuAtduAqZtSwCa6Vop8lmBVFGe6wdu9tR9YoSOIeZcY99GXMe7umro+pmzTWPECdBkCNdJCQg6tV50M4K7JsoxCwsZzzeyghR4LqpuRk5ID/+qRF/JZ4bnmL2DU4Mx5Be9PW9mxFUiTQhrpqTaQM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ScrcByj+; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ScrcByj+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707331350; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cuv9rvzEed+0pQINLR4eZlmzvTUXMYlIRXYRIOt5NJQ=; b=ScrcByj+UA6EkLnRSi/BAnSJIXKkghdb8ZjywreI1X610rxqmyR/9B6z7cqqo92QaKBU4I R1I6uvxNRGpBwf3uXAb+y1OMD1NivJ9a7LD3+4kaCQi/00JoXTMlAXQfHu+8XzBA/y3+4G YuoEFoMln9m7LmOC9VSU0EvAjyPveAs= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-664-s5caMUuGOx6vwEyw6XmQ_A-1; Wed, 07 Feb 2024 13:42:25 -0500 X-MC-Unique: s5caMUuGOx6vwEyw6XmQ_A-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1A62D29AC012; Wed, 7 Feb 2024 18:42:25 +0000 (UTC) Received: from fedora-work.redhat.com (unknown [10.22.32.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id A15F140C9444; Wed, 7 Feb 2024 18:42:24 +0000 (UTC) From: David Jeffery To: linux-kernel@vger.kernel.org Cc: linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Greg Kroah-Hartman , "Rafael J . Wysocki" , David Jeffery , Laurence Oberman Subject: [RFC PATCH 4/6] pci nvme async shutdown support Date: Wed, 7 Feb 2024 13:40:58 -0500 Message-ID: <20240207184100.18066-5-djeffery@redhat.com> In-Reply-To: <20240207184100.18066-1-djeffery@redhat.com> References: <20240207184100.18066-1-djeffery@redhat.com> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 Alter nvme so that the pci nvme driver may shut down the hardware asynchronously, providing a large benefit to shutdown time with large numbers of nvme devices in some configurations. The functionality of nvme_disable_ctrl is split in separate functions for starting disabling the hardware and for waiting for the hardware to report success. A wrapper is created which provides the same interface for the other nvme types so they remain unchanged. And new field is added to nvme-pci's nvme_dev to track the nvme shutdown state so that the nvme async_shutdown_end call knows what if anything it still needs to do. Signed-off-by: David Jeffery Tested-by: Laurence Oberman --- drivers/nvme/host/core.c | 26 +++++++++++++++----- drivers/nvme/host/nvme.h | 2 ++ drivers/nvme/host/pci.c | 53 ++++++++++++++++++++++++++++++++++++++-- 3 files changed, 73 insertions(+), 8 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 85ab0fcf9e88..b24985a843a8 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2242,20 +2242,21 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val, return ret; } -int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown) +int nvme_disable_ctrl_send(struct nvme_ctrl *ctrl, bool shutdown) { - int ret; - ctrl->ctrl_config &= ~NVME_CC_SHN_MASK; if (shutdown) ctrl->ctrl_config |= NVME_CC_SHN_NORMAL; else ctrl->ctrl_config &= ~NVME_CC_ENABLE; - ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config); - if (ret) - return ret; + return ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config); +} +EXPORT_SYMBOL_GPL(nvme_disable_ctrl_send); + +int nvme_disable_ctrl_wait(struct nvme_ctrl *ctrl, bool shutdown) +{ if (shutdown) { return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK, NVME_CSTS_SHST_CMPLT, @@ -2266,6 +2267,19 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown) return nvme_wait_ready(ctrl, NVME_CSTS_RDY, 0, (NVME_CAP_TIMEOUT(ctrl->cap) + 1) / 2, "reset"); } +EXPORT_SYMBOL_GPL(nvme_disable_ctrl_wait); + + +int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown) +{ + int ret; + + ret = nvme_disable_ctrl_send(ctrl, shutdown); + if (!ret) + ret = nvme_disable_ctrl_wait(ctrl, shutdown); + + return ret; +} EXPORT_SYMBOL_GPL(nvme_disable_ctrl); int nvme_enable_ctrl(struct nvme_ctrl *ctrl) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 030c80818240..5bdd862328d4 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -761,6 +761,8 @@ void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl); bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl, enum nvme_ctrl_state new_state); int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown); +int nvme_disable_ctrl_send(struct nvme_ctrl *ctrl, bool shutdown); +int nvme_disable_ctrl_wait(struct nvme_ctrl *ctrl, bool shutdown); int nvme_enable_ctrl(struct nvme_ctrl *ctrl); int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, const struct nvme_ctrl_ops *ops, unsigned long quirks); diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index c1d6357ec98a..c2a7b3d28a56 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -112,6 +112,12 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown); static void nvme_delete_io_queues(struct nvme_dev *dev); static void nvme_update_attrs(struct nvme_dev *dev); +enum nvme_disable_state { + NVME_DISABLE_START = 0, + NVME_DISABLE_WAIT = 1, + NVME_DISABLE_DONE = 2, +}; + /* * Represents an NVM Express device. Each nvme_dev is a PCI function. */ @@ -159,6 +165,7 @@ struct nvme_dev { unsigned int nr_allocated_queues; unsigned int nr_write_queues; unsigned int nr_poll_queues; + enum nvme_disable_state disable_state; }; static int io_queue_depth_set(const char *val, const struct kernel_param *kp) @@ -2574,12 +2581,14 @@ static bool nvme_pci_ctrl_is_dead(struct nvme_dev *dev) return (csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY); } -static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) +static void nvme_dev_disable_start(struct nvme_dev *dev, bool shutdown) { enum nvme_ctrl_state state = nvme_ctrl_state(&dev->ctrl); struct pci_dev *pdev = to_pci_dev(dev->dev); bool dead; + dev->disable_state = NVME_DISABLE_START; + mutex_lock(&dev->shutdown_lock); dead = nvme_pci_ctrl_is_dead(dev); if (state == NVME_CTRL_LIVE || state == NVME_CTRL_RESETTING) { @@ -2597,7 +2606,20 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) if (!dead && dev->ctrl.queue_count > 0) { nvme_delete_io_queues(dev); - nvme_disable_ctrl(&dev->ctrl, shutdown); + nvme_disable_ctrl_send(&dev->ctrl, shutdown); + dev->disable_state = NVME_DISABLE_WAIT; + } +} + +static void nvme_dev_disable_end(struct nvme_dev *dev, bool shutdown) +{ + struct pci_dev *pdev = to_pci_dev(dev->dev); + + if (dev->disable_state == NVME_DISABLE_DONE) + return; + + if (dev->disable_state == NVME_DISABLE_WAIT) { + nvme_disable_ctrl_wait(&dev->ctrl, shutdown); nvme_poll_irqdisable(&dev->queues[0]); } nvme_suspend_io_queues(dev); @@ -2623,6 +2645,12 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) mutex_unlock(&dev->shutdown_lock); } +static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) +{ + nvme_dev_disable_start(dev, shutdown); + nvme_dev_disable_end(dev, shutdown); +} + static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown) { if (!nvme_wait_reset(&dev->ctrl)) @@ -3120,6 +3148,25 @@ static void nvme_shutdown(struct pci_dev *pdev) nvme_disable_prepare_reset(dev, true); } +static void nvme_shutdown_start(struct pci_dev *pdev) +{ + struct nvme_dev *dev = pci_get_drvdata(pdev); + + if (!nvme_wait_reset(&dev->ctrl)) { + dev->disable_state = NVME_DISABLE_DONE; + return; + } + nvme_dev_disable_start(dev, true); +} + +static void nvme_shutdown_end(struct pci_dev *pdev) +{ + struct nvme_dev *dev = pci_get_drvdata(pdev); + + nvme_dev_disable_end(dev, true); +} + + /* * The driver's remove may be called on a device in a partially initialized * state. This function must not have any dependencies on the device state in @@ -3511,6 +3558,8 @@ static struct pci_driver nvme_driver = { .probe = nvme_probe, .remove = nvme_remove, .shutdown = nvme_shutdown, + .async_shutdown_start = nvme_shutdown_start, + .async_shutdown_end = nvme_shutdown_end, .driver = { .probe_type = PROBE_PREFER_ASYNCHRONOUS, #ifdef CONFIG_PM_SLEEP From patchwork Wed Feb 7 18:40:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Jeffery X-Patchwork-Id: 770796 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16DDD86123 for ; Wed, 7 Feb 2024 18:42:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331352; cv=none; b=nGJzgeRkfhBodXajh9kwb0E7DrDaosT5w7g6221CZVlB9V1FJHd4K/imwn8bAw61oxjHcqBw/sM2VS7zdB/FTJPDmxWC1d6otefSE8YXoYsnE9lg3INTc9ScEz3rT0K2kELSth+ykLKiGN2Zfl/oWMUvysf6WSCYe1wUQdXJOeY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331352; c=relaxed/simple; bh=WqtFxcWaLhMLfcgOm6/fPisBGllXDcSk/ewd1nnwqAo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lJlsA7Xp5AzWq7vs6s4L7Xf56zZt4XNDvq513B5K5/yq/Eo/R0kC6hZppbPU4kByLiiWYG/V3yWwQQhHD9lQWQ/cb8PDewGEgReRhQxv9+6itNIYR8tbtuDz1JH0jSND7Ehc9CeB2JKChdObz0kb8bfPjX68J+VFZFwYXOFswXI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RhP7krz7; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RhP7krz7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707331350; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CkqM01Ya14Nn8HXhASJf/0/vp8tMuDctYMCTKWoKrUI=; b=RhP7krz76HBBQaz3JyMrkCZmecFEw4F2MUKb40iSPCXIPKkycFR2OGK5yfnMTZe7oPGMwv zKhccv18ZyADClF1ebksbpvcp0BGgz05gZZeiLKPm6wtnwg535KtZZj22PGM1WHrqKgZR5 WK+h2vU37nk80tAITl03ghcTDCeQilI= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-180-Nw_CWQa9P_abWidS2p2VLA-1; Wed, 07 Feb 2024 13:42:26 -0500 X-MC-Unique: Nw_CWQa9P_abWidS2p2VLA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7AE253C23FC6; Wed, 7 Feb 2024 18:42:26 +0000 (UTC) Received: from fedora-work.redhat.com (unknown [10.22.32.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1127740C9444; Wed, 7 Feb 2024 18:42:26 +0000 (UTC) From: David Jeffery To: linux-kernel@vger.kernel.org Cc: linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Greg Kroah-Hartman , "Rafael J . Wysocki" , David Jeffery , Laurence Oberman Subject: [RFC PATCH 5/6] scsi mid layer support for async command submit Date: Wed, 7 Feb 2024 13:40:59 -0500 Message-ID: <20240207184100.18066-6-djeffery@redhat.com> In-Reply-To: <20240207184100.18066-1-djeffery@redhat.com> References: <20240207184100.18066-1-djeffery@redhat.com> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 Create scsi_execute_cmd_nowait to allow asynchronous scsi command submit. Parts of the code originally in scsi_execute_cmd are shifted into helper functions used by both scsi_execute_cmd and the new scsi_execute_cmd_nowait. The scsi_exec_args struct is expanded to contain the fields needed for the completion and callback for the async path. Signed-off-by: David Jeffery Tested-by: Laurence Oberman --- drivers/scsi/scsi_lib.c | 138 +++++++++++++++++++++++++++++-------- include/scsi/scsi_device.h | 8 +++ 2 files changed, 118 insertions(+), 28 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 1fb80eae9a63..fe35bc2021e3 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -185,42 +185,37 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason) } /** - * scsi_execute_cmd - insert request and wait for the result + * scsi_execute_init - helper for setting up a scsi_cmnd in a request * @sdev: scsi_device * @cmd: scsi command * @opf: block layer request cmd_flags - * @buffer: data buffer - * @bufflen: len of buffer * @timeout: request timeout in HZ * @retries: number of times to retry request - * @args: Optional args. See struct definition for field descriptions + * @args: scsi command args * - * Returns the scsi_cmnd result field if a command was executed, or a negative - * Linux error code if we didn't get that far. + * Returns a request if successful, or an error pointer if there was a failure. */ -int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd, - blk_opf_t opf, void *buffer, unsigned int bufflen, - int timeout, int retries, - const struct scsi_exec_args *args) +static struct request *scsi_execute_init(struct scsi_device *sdev, + const unsigned char *cmd, + blk_opf_t opf, + int timeout, int retries, + struct scsi_exec_args *args) { - static const struct scsi_exec_args default_args; struct request *req; struct scsi_cmnd *scmd; int ret; - if (!args) - args = &default_args; - else if (WARN_ON_ONCE(args->sense && - args->sense_len != SCSI_SENSE_BUFFERSIZE)) - return -EINVAL; + if (WARN_ON_ONCE(args->sense && + args->sense_len != SCSI_SENSE_BUFFERSIZE)) + return ERR_PTR(-EINVAL); req = scsi_alloc_request(sdev->request_queue, opf, args->req_flags); if (IS_ERR(req)) - return PTR_ERR(req); + return req; - if (bufflen) { - ret = blk_rq_map_kern(sdev->request_queue, req, - buffer, bufflen, GFP_NOIO); + if (args->bufflen) { + ret = blk_rq_map_kern(sdev->request_queue, req, args->buffer, + args->bufflen, GFP_NOIO); if (ret) goto out; } @@ -232,19 +227,27 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd, req->timeout = timeout; req->rq_flags |= RQF_QUIET; - /* - * head injection *required* here otherwise quiesce won't work - */ - blk_execute_rq(req, true); + return req; +out: + blk_mq_free_request(req); + return ERR_PTR(ret); +} + +static int scsi_execute_uninit(struct request *req, + struct scsi_exec_args *args) +{ + struct scsi_cmnd *scmd; + scmd = blk_mq_rq_to_pdu(req); /* * Some devices (USB mass-storage in particular) may transfer * garbage data together with a residue indicating that the data * is invalid. Prevent the garbage from being misinterpreted * and prevent security leaks by zeroing out the excess data. */ - if (unlikely(scmd->resid_len > 0 && scmd->resid_len <= bufflen)) - memset(buffer + bufflen - scmd->resid_len, 0, scmd->resid_len); + if (unlikely(scmd->resid_len > 0 && scmd->resid_len <= args->bufflen)) + memset(args->buffer + args->bufflen - scmd->resid_len, 0, + scmd->resid_len); if (args->resid) *args->resid = scmd->resid_len; @@ -254,14 +257,93 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd, scsi_normalize_sense(scmd->sense_buffer, scmd->sense_len, args->sshdr); - ret = scmd->result; - out: + args->result = scmd->result; + + if (args->callback) + args->callback(scmd, args); + + return scmd->result; +} + +/** + * scsi_execute_cmd - insert request and wait for the result + * @sdev: scsi_device + * @cmd: scsi command + * @opf: block layer request cmd_flags + * @buffer: data buffer + * @bufflen: len of buffer + * @timeout: request timeout in HZ + * @retries: number of times to retry request + * @args: Optional args. See struct definition for field descriptions + * + * Returns the scsi_cmnd result field if a command was executed, or a negative + * Linux error code if we didn't get that far. + */ +int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd, + blk_opf_t opf, void *buffer, unsigned int bufflen, + int timeout, int retries, + const struct scsi_exec_args *const_args) +{ + struct scsi_exec_args args; + int ret; + struct request *req; + + if (!const_args) + memset(&args, 0, sizeof(struct scsi_exec_args)); + else + args = *const_args; + + args.buffer = buffer; + args.bufflen = bufflen; + + req = scsi_execute_init(sdev, cmd, opf, timeout, retries, &args); + + if (IS_ERR(req)) + return PTR_ERR(req); + + /* + * head injection *required* here otherwise quiesce won't work + */ + blk_execute_rq(req, true); + + ret = scsi_execute_uninit(req, &args); + blk_mq_free_request(req); return ret; } EXPORT_SYMBOL(scsi_execute_cmd); + +static enum rq_end_io_ret scsi_execute_cmd_complete(struct request *req, + blk_status_t ret) +{ + struct scsi_exec_args *args = req->end_io_data; + + scsi_execute_uninit(req, args); + return RQ_END_IO_FREE; +} + +int scsi_execute_cmd_nowait(struct scsi_device *sdev, const unsigned char *cmd, + blk_opf_t opf, int timeout, int retries, + struct scsi_exec_args *args) +{ + struct request *req; + + req = scsi_execute_init(sdev, cmd, opf, timeout, retries, args); + + if (IS_ERR(req)) + return PTR_ERR(req); + + req->end_io = scsi_execute_cmd_complete; + req->end_io_data = args; + + blk_execute_rq_nowait(req, true); + + return 0; +} +EXPORT_SYMBOL(scsi_execute_cmd_nowait); + /* * Wake up the error handler if necessary. Avoid as follows that the error * handler is not woken up if host in-flight requests number == diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 5ec1e71a09de..c80c98b48bc1 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -497,6 +497,10 @@ struct scsi_exec_args { blk_mq_req_flags_t req_flags; /* BLK_MQ_REQ flags */ int scmd_flags; /* SCMD flags */ int *resid; /* residual length */ + void *buffer; /* data buffer */ + unsigned int bufflen; /* buffer length */ + int result; /* scsi layer result */ + void (*callback)(struct scsi_cmnd *scmd, struct scsi_exec_args *args); }; int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd, @@ -504,6 +508,10 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd, int timeout, int retries, const struct scsi_exec_args *args); +int scsi_execute_cmd_nowait(struct scsi_device *sdev, const unsigned char *cmd, + blk_opf_t opf, int timeout, int retries, + struct scsi_exec_args *args); + extern void sdev_disable_disk_events(struct scsi_device *sdev); extern void sdev_enable_disk_events(struct scsi_device *sdev); extern int scsi_vpd_lun_id(struct scsi_device *, char *, size_t); From patchwork Wed Feb 7 18:41:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Jeffery X-Patchwork-Id: 771517 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90EAA86154 for ; Wed, 7 Feb 2024 18:42:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331354; cv=none; b=qwFfcSb7c8LkIiHjYTlVd/cNjHOZlWKLWI4e8ccMCkLGbXUO9QjVu2Mtax1ep25g8lWSIwZah2kM9vd9XFoF9myEgRyq66url57w7yROObO+8cjqC0GpOBDKokEq9jmq9sTmbHVFnS2Nnw9YPbV9+R5NJC6t6nSgY5ZZF3latfg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707331354; c=relaxed/simple; bh=WxvH6FDX/eWNGMCI+SWHVfMnLqP3askGdu0ASe254/Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QCVkssmW3ejyYn7/VS+ItjiRG7U23m67nGoIy3RDPlbh8DjYBOSmzo1lUm4r/cvVUuAlfwbI8nIQxtuaOqcaC0VQqaH/woDS99Vf8qSDF1KTTpfDyLnXwt7h7cahNu3H26nBgPJFvfUHu7TqvG04IO+8XCy/HH7dkRtaYC9WhGY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VypNTV9W; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VypNTV9W" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707331351; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VwU//eJ9yadzq/fsfDhiEtvVbtfHI98m1htYpt9Ipdk=; b=VypNTV9WiURW96KQA7JQELdQr6EjhtqE0ygMdedIRNKhEvo9ELc/nE0xMNn1Zwz7lFWjvE EGpbvBkA8jajng+WpmKiG0VEV5JzSCow8DlBnSTv2FG8vtsuulmCbHsdsAvQUMYW5FL1Sr 09tgT7Bo48fv0YbzOCeLMcYBYa7iqsI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-686-m-6gGQ7ROWmYlHUy-tFPLQ-1; Wed, 07 Feb 2024 13:42:28 -0500 X-MC-Unique: m-6gGQ7ROWmYlHUy-tFPLQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 28C2C84AF81; Wed, 7 Feb 2024 18:42:28 +0000 (UTC) Received: from fedora-work.redhat.com (unknown [10.22.32.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id AC90B4011FF1; Wed, 7 Feb 2024 18:42:27 +0000 (UTC) From: David Jeffery To: linux-kernel@vger.kernel.org Cc: linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Greg Kroah-Hartman , "Rafael J . Wysocki" , David Jeffery , Laurence Oberman Subject: [RFC PATCH 6/6] sd: async cache flush on shutdown Date: Wed, 7 Feb 2024 13:41:00 -0500 Message-ID: <20240207184100.18066-7-djeffery@redhat.com> In-Reply-To: <20240207184100.18066-1-djeffery@redhat.com> References: <20240207184100.18066-1-djeffery@redhat.com> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 Add async shutdown for the cache flush to the sd device by sending a SYNCHRONIZE_CACHE command asynchronously. If there is any sort of error, falls back to the synchronous sd_sync_cache() to try again and resolve any errors. Signed-off-by: David Jeffery Tested-by: Laurence Oberman --- drivers/scsi/sd.c | 66 ++++++++++++++++++++++++++++++++++++++++++----- drivers/scsi/sd.h | 2 ++ 2 files changed, 62 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 0833b3e6aa6e..f972310df76a 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3838,23 +3838,64 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start) return 0; } -/* - * Send a SYNCHRONIZE CACHE instruction down to the device through - * the normal SCSI command structure. Wait for the command to - * complete. - */ -static void sd_shutdown(struct device *dev) +static void sd_sync_cache_callback(struct scsi_cmnd *scmd, + struct scsi_exec_args *args) { + struct scsi_disk *sdkp; + + sdkp = container_of(args, struct scsi_disk, shutdown_params); + complete(&sdkp->shutdown_done); +} + +static void sd_async_shutdown_start(struct device *dev) { struct scsi_disk *sdkp = dev_get_drvdata(dev); + const int timeout = sdkp->device->request_queue->rq_timeout + * SD_FLUSH_TIMEOUT_MULTIPLIER; + int ret; if (!sdkp) return; /* this can happen */ + init_completion(&sdkp->shutdown_done); + sdkp->shutdown_params.callback = sd_sync_cache_callback; + if (pm_runtime_suspended(dev)) return; if (sdkp->WCE && sdkp->media_present) { + unsigned char cmd[16] = { 0 }; + sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n"); + if (sdkp->device->use_16_for_sync) + cmd[0] = SYNCHRONIZE_CACHE_16; + else + cmd[0] = SYNCHRONIZE_CACHE; + + ret = scsi_execute_cmd_nowait(sdkp->device, cmd, REQ_OP_DRV_IN, + timeout, sdkp->max_retries, + &sdkp->shutdown_params); + if (!ret) + return; + sdkp->shutdown_params.result = ret; + } + /* no async I/O to do, go ahead and mark it complete */ + complete(&sdkp->shutdown_done); +} + +static void sd_async_shutdown_end(struct device *dev) +{ + struct scsi_disk *sdkp = dev_get_drvdata(dev); + + if (!sdkp) + return; + + if (pm_runtime_suspended(dev)) + return; + + wait_for_completion(&sdkp->shutdown_done); + + if (sdkp->WCE && sdkp->media_present && sdkp->shutdown_params.result) { + /* for any error with the async flush, retry as sync */ sd_sync_cache(sdkp); } @@ -3867,6 +3908,17 @@ static void sd_shutdown(struct device *dev) } } +/* + * Send a SYNCHRONIZE CACHE instruction down to the device through + * the normal SCSI command structure. Wait for the command to + * complete. + */ +static void sd_shutdown(struct device *dev) +{ + sd_async_shutdown_start(dev); + sd_async_shutdown_end(dev); +} + static inline bool sd_do_start_stop(struct scsi_device *sdev, bool runtime) { return (sdev->manage_system_start_stop && !runtime) || @@ -4003,6 +4055,8 @@ static struct scsi_driver sd_template = { .probe_type = PROBE_PREFER_ASYNCHRONOUS, .remove = sd_remove, .shutdown = sd_shutdown, + .async_shutdown_start = sd_async_shutdown_start, + .async_shutdown_end = sd_async_shutdown_end, .pm = &sd_pm_ops, }, .rescan = sd_rescan, diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 409dda5350d1..7b5098211cec 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -91,6 +91,8 @@ struct scsi_disk { struct device disk_dev; struct gendisk *disk; struct opal_dev *opal_dev; + struct scsi_exec_args shutdown_params; + struct completion shutdown_done; #ifdef CONFIG_BLK_DEV_ZONED /* Updated during revalidation before the gendisk capacity is known. */ struct zoned_disk_info early_zone_info;