[5.4,055/132] nvme-tcp: fix timeout handler

Message ID	20200915140646.868260169@linuxfoundation.org
State	Superseded
Headers	show Return-Path: <SRS0=vMwi=CZ=vger.kernel.org=stable-owner@kernel.org> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Sagi Grimberg <sagi@grimberg.me>, Sasha Levin <sashal@kernel.org> Subject: [PATCH 5.4 055/132] nvme-tcp: fix timeout handler Date: Tue, 15 Sep 2020 16:12:37 +0200 Message-Id: <20200915140646.868260169@linuxfoundation.org> In-Reply-To: <20200915140644.037604909@linuxfoundation.org> References: <20200915140644.037604909@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org Precedence: bulk
Series	None \| expand [5.4,002/132] ARM: dts: logicpd-som-lv-baseboard: Fix broken audio [5.4,003/132] ARM: dts: logicpd-som-lv-baseboard: Fix missing video [5.4,004/132] regulator: push allocation in regulator_ena_gpio_request() out of lock [5.4,005/132] regulator: remove superfluous lock in regulator_resolve_coupling() [5.4,006/132] ARM: dts: socfpga: fix register entry for timer3 on Arria10 [5.4,007/132] selftests/timers: Turn off timeout setting [5.4,008/132] ARM: dts: ls1021a: fix QuadSPI-memory reg range [5.4,009/132] ARM: dts: imx7ulp: Correct gpio ranges [5.4,010/132] RDMA/rxe: Fix memleak in rxe_mem_init_user [5.4,011/132] RDMA/rxe: Drop pointless checks in rxe_init_ports [5.4,012/132] RDMA/rxe: Fix panic when calling kmem_cache_create() [5.4,013/132] RDMA/bnxt_re: Do not report transparent vlan from QP1 [5.4,014/132] drm/sun4i: add missing put_device() call in sun8i_r40_tcon_tv_set_mux() [5.4,015/132] arm64: dts: imx8mq: Fix TMU interrupt property [5.4,016/132] drm/sun4i: Fix dsi dcs long write function [5.4,017/132] iio: adc: mcp3422: fix locking on error path [5.4,018/132] scsi: libsas: Set data_dir as DMA_NONE if libata marks qc as NODATA [5.4,019/132] RDMA/core: Fix reported speed and width [5.4,020/132] scsi: megaraid_sas: Dont call disable_irq from process IRQ poll [5.4,021/132] scsi: mpt3sas: Dont call disable_irq from IRQ poll handler [5.4,022/132] soundwire: fix double free of dangling pointer [5.4,023/132] drm/sun4i: backend: Support alpha property on lowest plane [5.4,024/132] drm/sun4i: backend: Disable alpha on the lowest plane on the A20 [5.4,025/132] mmc: sdhci-acpi: Clear amd_sdhci_host on reset [5.4,026/132] mmc: sdhci-msm: Add retries when all tuning phases are found valid [5.4,027/132] spi: stm32: Rate-limit the Communication suspended message [5.4,028/132] nvme-fabrics: allow to queue requests for live queues [5.4,029/132] spi: stm32: fix pm_runtime_get_sync() error checking [5.4,030/132] block: Set same_page to false in __bio_try_merge_page if ret is false [5.4,031/132] IB/isert: Fix unaligned immediate-data handling [5.4,032/132] ARM: dts: bcm: HR2: Fixed QSPI compatible string [5.4,033/132] ARM: dts: NSP: Fixed QSPI compatible string [5.4,034/132] ARM: dts: BCM5301X: Fixed QSPI compatible string [5.4,035/132] arm64: dts: ns2: Fixed QSPI compatible string [5.4,036/132] ARC: HSDK: wireup perf irq [5.4,037/132] dmaengine: acpi: Put the CSRT table after using it [5.4,038/132] netfilter: conntrack: allow sctp hearbeat after connection re-use [5.4,039/132] drivers/net/wan/lapbether: Added needed_tailroom [5.4,040/132] NFC: st95hf: Fix memleak in st95hf_in_send_cmd [5.4,041/132] firestream: Fix memleak in fs_open [5.4,042/132] ALSA: hda: Fix 2 channel swapping for Tegra [5.4,043/132] ALSA: hda/tegra: Program WAKEEN register for Tegra [5.4,044/132] drivers/dma/dma-jz4780: Fix race condition between probe and irq handler [5.4,045/132] net: hns3: Fix for geneve tx checksum bug [5.4,046/132] xfs: fix off-by-one in inode alloc block reservation calculation [5.4,047/132] drivers/net/wan/lapbether: Set network_header before transmitting [5.4,048/132] cfg80211: Adjust 6 GHz frequency to channel conversion [5.4,049/132] xfs: initialize the shortform attr header padding entry [5.4,050/132] irqchip/eznps: Fix build error for !ARC700 builds [5.4,051/132] nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu [5.4,052/132] nvme-fabrics: dont check state NVME_CTRL_NEW for request acceptance [5.4,053/132] nvme: have nvme_wait_freeze_timeout return if it timed out [5.4,054/132] nvme-tcp: serialize controller teardown sequences [5.4,055/132] nvme-tcp: fix timeout handler [5.4,056/132] nvme-tcp: fix reset hang if controller died in the middle of a reset [5.4,057/132] nvme-rdma: serialize controller teardown sequences [5.4,058/132] nvme-rdma: fix timeout handler [5.4,059/132] nvme-rdma: fix reset hang if controller died in the middle of a reset [5.4,060/132] nvme-pci: cancel nvme device request before disabling [5.4,061/132] HID: quirks: Set INCREMENT_USAGE_ON_DUPLICATE for all Saitek X52 devices [5.4,062/132] HID: microsoft: Add rumble support for the 8bitdo SN30 Pro+ controller [5.4,063/132] drivers/net/wan/hdlc_cisco: Add hard_header_len [5.4,064/132] HID: elan: Fix memleak in elan_input_configured [5.4,065/132] ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-id [5.4,066/132] cpufreq: intel_pstate: Refuse to turn off with HWP enabled [5.4,067/132] cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled [5.4,068/132] arm64/module: set trampoline section flags regardless of CONFIG_DYNAMIC_FTRACE [5.4,069/132] ALSA: hda: hdmi - add Rocketlake support [5.4,070/132] ALSA: hda: fix a runtime pm issue in SOF when integrated GPU is disabled [5.4,071/132] drm/amdgpu: Fix bug in reporting voltage for CIK [5.4,072/132] iommu/amd: Do not use IOMMUv2 functionality when SME is active [5.4,073/132] gcov: Disable gcov build with GCC 10 [5.4,074/132] iio: adc: mcp3422: fix locking scope [5.4,075/132] iio: adc: ti-ads1015: fix conversion when CONFIG_PM is not set [5.4,076/132] iio: cros_ec: Set Gyroscope default frequency to 25Hz [5.4,077/132] iio:light:ltr501 Fix timestamp alignment issue. [5.4,078/132] iio:proximity:mb1232: Fix timestamp alignment and prevent data leak. [5.4,079/132] iio:accel:bmc150-accel: Fix timestamp alignment and prevent data leak. [5.4,080/132] iio:adc:ti-adc084s021 Fix alignment and data leak issues. [5.4,081/132] iio:adc:ina2xx Fix timestamp alignment issue. [5.4,082/132] iio:adc:max1118 Fix alignment of timestamp and data leak issues [5.4,083/132] iio:adc:ti-adc081c Fix alignment and data leak issues [5.4,084/132] iio:magnetometer:ak8975 Fix alignment and data leak issues. [5.4,085/132] iio:light:max44000 Fix timestamp alignment and prevent data leak. [5.4,086/132] iio:chemical:ccs811: Fix timestamp alignment and prevent data leak. [5.4,087/132] iio: accel: kxsd9: Fix alignment of local buffer. [5.4,088/132] iio:accel:mma7455: Fix timestamp alignment and prevent data leak. [5.4,089/132] iio:accel:mma8452: Fix timestamp alignment and prevent data leak. [5.4,090/132] staging: wlan-ng: fix out of bounds read in prism2sta_probe_usb() [5.4,091/132] btrfs: require only sector size alignment for parent eb bytenr [5.4,092/132] btrfs: fix lockdep splat in add_missing_dev [5.4,093/132] btrfs: fix wrong address when faulting in pages in the search ioctl [5.4,094/132] firmware_loader: fix memory leak for paged buffer [5.4,095/132] kobject: Restore old behaviour of kobject_del(NULL) [5.4,096/132] regulator: push allocation in regulator_init_coupling() outside of lock [5.4,097/132] regulator: push allocations in create_regulator() outside of lock [5.4,098/132] regulator: push allocation in set_consumer_device_supply() out of lock [5.4,099/132] regulator: plug of_node leak in regulator_register()s error path [5.4,100/132] regulator: core: Fix slab-out-of-bounds in regulator_unlock_recursive() [5.4,101/132] scsi: target: iscsi: Fix data digest calculation [5.4,102/132] scsi: target: iscsi: Fix hang in iscsit_access_np() when getting tpg->np_login_sem [5.4,103/132] drm/i915/gvt: do not check len & max_len for lri [5.4,104/132] drm/tve200: Stabilize enable/disable [5.4,105/132] drm/msm: Split the a5xx preemption record [5.4,106/132] drm/msm: Disable preemption on all 5xx targets [5.4,107/132] mmc: sdio: Use mmc_pre_req() / mmc_post_req() [5.4,108/132] mmc: sdhci-of-esdhc: Dont walk device-tree on every interrupt [5.4,109/132] rbd: require global CAP_SYS_ADMIN for mapping and unmapping [5.4,110/132] RDMA/rxe: Fix the parent sysfs read when the interface has 15 chars [5.4,111/132] RDMA/mlx4: Read pkey table length instead of hardcoded value [5.4,112/132] fbcon: remove soft scrollback code [5.4,113/132] fbcon: remove now unusued softback_lines cursor() argument [5.4,114/132] vgacon: remove software scrollback support [5.4,115/132] KVM: VMX: Dont freeze guest when event delivery causes an APIC-access exit [5.4,116/132] KVM: arm64: Do not try to map PUDs when they are folded into PMD [5.4,117/132] KVM: fix memory leak in kvm_io_bus_unregister_dev() [5.4,118/132] debugfs: Fix module state check condition [5.4,119/132] ARM: dts: vfxxx: Add syscon compatible with OCOTP [5.4,120/132] video: fbdev: fix OOB read in vga_8planes_imageblit() [5.4,121/132] staging: greybus: audio: fix uninitialized value issue [5.4,122/132] phy: qcom-qmp: Use correct values for ipq8074 PCIe Gen2 PHY init [5.4,123/132] usb: core: fix slab-out-of-bounds Read in read_descriptors [5.4,124/132] USB: serial: ftdi_sio: add IDs for Xsens Mti USB converter [5.4,125/132] USB: serial: option: support dynamic Quectel USB compositions [5.4,126/132] USB: serial: option: add support for SIM7070/SIM7080/SIM7090 modules [5.4,127/132] usb: Fix out of sync data toggle if a configured device is reconfigured [5.4,128/132] usb: typec: ucsi: acpi: Check the _DEP dependencies [5.4,129/132] drm/msm/gpu: make ringbuffer readonly [5.4,130/132] drm/msm/a6xx: update a6xx_hw_init for A640 and A650 [5.4,131/132] drm/msm: Enable expanded apriv support for a650 [5.4,132/132] drm/msm: Disable the RPTR shadow

Message ID

20200915140646.868260169@linuxfoundation.org

State

Superseded

Headers

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Sagi Grimberg <sagi@grimberg.me>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 055/132] nvme-tcp: fix timeout handler
Date: Tue, 15 Sep 2020 16:12:37 +0200
Message-Id: <20200915140646.868260169@linuxfoundation.org>
In-Reply-To: <20200915140644.037604909@linuxfoundation.org>
References: <20200915140644.037604909@linuxfoundation.org>
User-Agent: quilt/0.66
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: stable-owner@vger.kernel.org
Precedence: bulk

Series

None | expand

Commit Message

Greg KH Sept. 15, 2020, 2:12 p.m. UTC

From: Sagi Grimberg <sagi@grimberg.me>

[ Upstream commit 236187c4ed195161dfa4237c7beffbba0c5ae45b ]

When a request times out in a LIVE state, we simply trigger error
recovery and let the error recovery handle the request cancellation,
however when a request times out in a non LIVE state, we make sure to
complete it immediately as it might block controller setup or teardown
and prevent forward progress.

However tearing down the entire set of I/O and admin queues causes
freeze/unfreeze imbalance (q->mq_freeze_depth) because and is really
an overkill to what we actually need, which is to just fence controller
teardown that may be running, stop the queue, and cancel the request if
it is not already completed.

Now that we have the controller teardown_lock, we can safely serialize
request cancellation. This addresses a hang caused by calling extra
queue freeze on controller namespaces, causing unfreeze to not complete
correctly.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/nvme/host/tcp.c | 56 ++++++++++++++++++++++++++---------------
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index a94c80727de1e..98a045429293e 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -421,6 +421,7 @@  static void nvme_tcp_error_recovery(struct nvme_ctrl *ctrl)
 	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
 		return;
 
+	dev_warn(ctrl->device, "starting error recovery\n");
 	queue_work(nvme_reset_wq, &to_tcp_ctrl(ctrl)->err_work);
 }
 
@@ -2057,40 +2058,55 @@  static void nvme_tcp_submit_async_event(struct nvme_ctrl *arg)
 	nvme_tcp_queue_request(&ctrl->async_req);
 }
 
+static void nvme_tcp_complete_timed_out(struct request *rq)
+{
+	struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_ctrl *ctrl = &req->queue->ctrl->ctrl;
+
+	/* fence other contexts that may complete the command */
+	mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock);
+	nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue));
+	if (!blk_mq_request_completed(rq)) {
+		nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
+		blk_mq_complete_request(rq);
+	}
+	mutex_unlock(&to_tcp_ctrl(ctrl)->teardown_lock);
+}
+
 static enum blk_eh_timer_return
 nvme_tcp_timeout(struct request *rq, bool reserved)
 {
 	struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq);
-	struct nvme_tcp_ctrl *ctrl = req->queue->ctrl;
+	struct nvme_ctrl *ctrl = &req->queue->ctrl->ctrl;
 	struct nvme_tcp_cmd_pdu *pdu = req->pdu;
 
-	/*
-	 * Restart the timer if a controller reset is already scheduled. Any
-	 * timed out commands would be handled before entering the connecting
-	 * state.
-	 */
-	if (ctrl->ctrl.state == NVME_CTRL_RESETTING)
-		return BLK_EH_RESET_TIMER;
-
-	dev_warn(ctrl->ctrl.device,
+	dev_warn(ctrl->device,
 		"queue %d: timeout request %#x type %d\n",
 		nvme_tcp_queue_id(req->queue), rq->tag, pdu->hdr.type);
 
-	if (ctrl->ctrl.state != NVME_CTRL_LIVE) {
+	if (ctrl->state != NVME_CTRL_LIVE) {
 		/*
-		 * Teardown immediately if controller times out while starting
-		 * or we are already started error recovery. all outstanding
-		 * requests are completed on shutdown, so we return BLK_EH_DONE.
+		 * If we are resetting, connecting or deleting we should
+		 * complete immediately because we may block controller
+		 * teardown or setup sequence
+		 * - ctrl disable/shutdown fabrics requests
+		 * - connect requests
+		 * - initialization admin requests
+		 * - I/O requests that entered after unquiescing and
+		 *   the controller stopped responding
+		 *
+		 * All other requests should be cancelled by the error
+		 * recovery work, so it's fine that we fail it here.
 		 */
-		flush_work(&ctrl->err_work);
-		nvme_tcp_teardown_io_queues(&ctrl->ctrl, false);
-		nvme_tcp_teardown_admin_queue(&ctrl->ctrl, false);
+		nvme_tcp_complete_timed_out(rq);
 		return BLK_EH_DONE;
 	}
 
-	dev_warn(ctrl->ctrl.device, "starting error recovery\n");
-	nvme_tcp_error_recovery(&ctrl->ctrl);
-
+	/*
+	 * LIVE state should trigger the normal error recovery which will
+	 * handle completing this request.
+	 */
+	nvme_tcp_error_recovery(ctrl);
 	return BLK_EH_RESET_TIMER;
 }

[5.4,055/132] nvme-tcp: fix timeout handler

Commit Message

Patch