[5.15,006/277] nbd: Fix hungtask when nbd_config_put

From: Ye Bin <yebin10@huawei.com>

From: Ye Bin <yebin10@huawei.com>

[ Upstream commit e2daec488c57069a4a431d5b752f50294c4bf273 ]

I got follow issue:
[  247.381177] INFO: task kworker/u10:0:47 blocked for more than 120 seconds.
[  247.382644]       Not tainted 4.19.90-dirty #140
[  247.383502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.385027] Call Trace:
[  247.388384]  schedule+0xb8/0x3c0
[  247.388966]  schedule_timeout+0x2b4/0x380
[  247.392815]  wait_for_completion+0x367/0x510
[  247.397713]  flush_workqueue+0x32b/0x1340
[  247.402700]  drain_workqueue+0xda/0x3c0
[  247.403442]  destroy_workqueue+0x7b/0x690
[  247.405014]  nbd_config_put.cold+0x2f9/0x5b6
[  247.405823]  recv_work+0x1fd/0x2b0
[  247.406485]  process_one_work+0x70b/0x1610
[  247.407262]  worker_thread+0x5a9/0x1060
[  247.408699]  kthread+0x35e/0x430
[  247.410918]  ret_from_fork+0x1f/0x30

We can reproduce issue as follows:
1. Inject memory fault in nbd_start_device
-1244,10 +1248,18 @@ static int nbd_start_device(struct nbd_device *nbd)
        nbd_dev_dbg_init(nbd);
        for (i = 0; i < num_connections; i++) {
                struct recv_thread_args *args;
-
-               args = kzalloc(sizeof(*args), GFP_KERNEL);
+
+               if (i == 1) {
+                       args = NULL;
+                       printk("%s: inject malloc error\n", __func__);
+               }
+               else
+                       args = kzalloc(sizeof(*args), GFP_KERNEL);
2. Inject delay in recv_work
-757,6 +760,8 @@ static void recv_work(struct work_struct *work)

                blk_mq_complete_request(blk_mq_rq_from_pdu(cmd));
        }
+       printk("%s: comm=%s pid=%d\n", __func__, current->comm, current->pid);
+       mdelay(5 * 1000);
        nbd_config_put(nbd);
        atomic_dec(&config->recv_threads);
        wake_up(&config->recv_wq);
3. Create nbd server
nbd-server 8000 /tmp/disk
4. Create nbd client
nbd-client localhost 8000 /dev/nbd1
Then will trigger above issue.

Reason is when add delay in recv_work, lead to release the last reference
of 'nbd->config_refs'. nbd_config_put will call flush_workqueue to make
all work finish. Obviously, it will lead to deadloop.
To solve this issue, according to Josef's suggestion move 'recv_work'
init from start device to nbd_dev_add, then destroy 'recv_work'when
nbd device teardown.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20211102015237.2309763-5-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/block/nbd.c | 36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

Message ID	20220412062942.215355614@linuxfoundation.org
State	New
Headers	show Return-Path: <stable-owner@kernel.org> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Ye Bin <yebin10@huawei.com>, Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org> Subject: [PATCH 5.15 006/277] nbd: Fix hungtask when nbd_config_put Date: Tue, 12 Apr 2022 08:26:49 +0200 Message-Id: <20220412062942.215355614@linuxfoundation.org> In-Reply-To: <20220412062942.022903016@linuxfoundation.org> References: <20220412062942.022903016@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	None \| expand [5.15,003/277] rtc: wm8350: Handle error for wm8350_register_irq [5.15,005/277] nbd: Fix incorrect error handle when first_minor is illegal in nbd_dev_add [5.15,006/277] nbd: Fix hungtask when nbd_config_put [5.15,010/277] kfence: limit currently covered allocations when pool nearly full [5.15,013/277] KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs [5.15,014/277] KVM: x86/pmu: Fix and isolate TSX-specific performance event logic [5.15,015/277] KVM: x86/emulator: Emulate RDPID only if it is enabled in guest [5.15,016/277] drm: Add orientation quirk for GPD Win Max [5.15,017/277] ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111 [5.15,018/277] drm/amd/display: Add signal type check when verify stream backends same [5.15,019/277] drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj [5.15,023/277] usb: gadget: tegra-xudc: Fix control endpoints definitions [5.15,024/277] usb: cdnsp: fix cdnsp_decode_trb function to properly handle ret value [5.15,025/277] ptp: replace snprintf with sysfs_emit [5.15,027/277] powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 [5.15,030/277] ath11k: mhi: use mhi_sync_power_up() [5.15,031/277] net/smc: Send directly when TCP_CORK is cleared [5.15,032/277] drm/bridge: Add missing pm_runtime_put_sync [5.15,033/277] bpf: Make dst_port field in struct bpf_sock 16-bit wide [5.15,035/277] scsi: bfa: Replace snprintf() with sysfs_emit() [5.15,038/277] mt76: mt7921: fix crash when startup fails. [5.15,041/277] libbpf: Fix build issue with llvm-readelf [5.15,042/277] ipv6: make mc_forwarding atomic [5.15,046/277] scsi: smartpqi: Fix kdump issue when controller is locked up [5.15,047/277] PCI: aardvark: Fix support for MSI interrupts [5.15,048/277] iommu/arm-smmu-v3: fix event handling soft lockup [5.15,050/277] PCI: endpoint: Fix alignment fault error in copy tests [5.15,052/277] PCI: pciehp: Add Qualcomm quirk for Command Completed erratum [5.15,053/277] scsi: mpi3mr: Fix reporting of actual data transfer size [5.15,056/277] power: supply: axp288-charger: Set Vhold to 4.4V [5.15,057/277] net/mlx5e: Disable TX queues before registering the netdev [5.15,058/277] usb: dwc3: pci: Set the swnode from inside dwc3_pci_quirks() [5.15,060/277] iwlwifi: mvm: move only to an enabled channel [5.15,063/277] dm ioctl: prevent potential spectre v1 gadget [5.15,067/277] scsi: pm8001: Fix pm80xx_pci_mem_copy() interface [5.15,068/277] scsi: pm8001: Fix pm8001_mpi_task_abort_resp() [5.15,071/277] scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req() [5.15,072/277] mt76: mt7915: fix injected MPDU transmission to not use HW A-MSDU [5.15,073/277] powerpc/64s/hash: Make hash faults work in NMI context [5.15,074/277] mt76: mt7615: Fix assigning negative values to unsigned variable [5.15,075/277] scsi: aha152x: Fix aha152x_setup() __setup handler return value [5.15,076/277] scsi: hisi_sas: Free irq vectors in order for v3 HW [5.15,077/277] scsi: hisi_sas: Limit users changing debugfs BIST count value [5.15,083/277] Bluetooth: use memset avoid memory leaks [5.15,084/277] bnxt_en: Eliminate unintended link toggle during FW reset [5.15,091/277] can: isotp: set default value for N_As to 50 micro seconds [5.15,092/277] can: etas_es58x: es58x_fd_rx_event_msg(): initialize rx_event_msg before calling e... [5.15,093/277] riscv: Fixed misaligned memory access. Fixed pointer comparison. [5.15,094/277] net: account alternate interface name memory [5.15,095/277] net: limit altnames to 64k total [5.15,096/277] net/mlx5e: Remove overzealous validations in netlink EEPROM query [5.15,097/277] net: sfp: add 2500base-X quirk for Lantech SFP module [5.15,100/277] xtensa: fix DTC warning unit_address_format [5.15,106/277] ceph: fix memory leak in ceph_readdir when note_last_dentry returns error [5.15,108/277] init/main.c: return 1 from handled __setup() functions [5.15,110/277] clk: si5341: fix reported clk_rate when output divider is 2 [5.15,113/277] phy: amlogic: phy-meson-gxl-usb2: fix shared reset controller use [5.15,114/277] phy: amlogic: meson8b-usb2: Use dev_err_probe() [5.15,116/277] clk: rockchip: drop CLK_SET_RATE_PARENT from dclk_vop* on rk3568 [5.15,119/277] staging: wfx: fix an error handling in wfx_init_common() [5.15,120/277] w1: w1_therm: fixes w1_seq for ds28ea00 sensors [5.15,124/277] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 [5.15,126/277] clk: Enforce that disjoints limits are invalid [5.15,131/277] NFS: swap-out must always use STABLE writes. [5.15,133/277] x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy [5.15,134/277] serial: samsung_tty: do not unlock port->lock for uart_write_wakeup() [5.15,137/277] SUNRPC: Fix socket waits for write buffer space [5.15,139/277] NFS: Avoid writeback threads getting stuck in mempool_alloc() [5.15,141/277] parisc: Fix CPU affinity for Lasi, WAX and Dino chips [5.15,142/277] parisc: Fix patch code locking and flushing [5.15,145/277] rtc: Check return value from mc146818_get_time() [5.15,147/277] drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire() [5.15,148/277] Drivers: hv: vmbus: Fix potential crash on module unload [5.15,149/277] Revert "NFSv4: Handle the special Linux file open access mode" [5.15,150/277] NFSv4: fix open failure with O_ACCMODE flag [5.15,152/277] scsi: core: Fix sbitmap depth in scsi_realloc_sdev_budget_map() [5.15,156/277] vdpa: mlx5: prevent cvq work from hogging CPU [5.15,157/277] net: sfc: add missing xdp queue reinitialization [5.15,158/277] net/tls: fix slab-out-of-bounds bug in decrypt_internal [5.15,159/277] vrf: fix packet sniffing for traffic originating from ip tunnels [5.15,163/277] net: ipv4: fix route with nexthop object delete warning [5.15,164/277] net: stmmac: Fix unset max_speed difference between DT and non-DT platforms [5.15,167/277] drm/imx: dw_hdmi-imx: Fix bailout in error cases of probe [5.15,168/277] regulator: rtq2134: Fix missing active_discharge_on setting [5.15,170/277] arch/arm64: Fix topology initialization for core scheduling [5.15,171/277] bnxt_en: Synchronize tx when xdp redirects happen on same ring [5.15,172/277] bnxt_en: reserve space inside receive page for skb_shared_info [5.15,174/277] sfc: Do not free an empty page_ring [5.15,178/277] IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition [5.15,179/277] sctp: count singleton chunks in assoc user stats [5.15,180/277] dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe [5.15,182/277] ice: Do not skip not enabled queues in ice_vc_dis_qs_msg [5.15,183/277] ipv6: Fix stats accounting in ip6_pkt_drop [5.15,184/277] ice: synchronize_rcu() when terminating rings [5.15,186/277] net: openvswitch: dont send internal clone attribute to the userspace. [5.15,189/277] rxrpc: fix a race in rxrpc_exit_net() [5.15,191/277] net: phy: mscc-miim: reject clause 45 register accesses [5.15,192/277] qede: confirm skb is allocated before using [5.15,193/277] spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op() [5.15,196/277] scsi: ufs: ufshpb: Fix a NULL check on list iterator [5.15,198/277] io_uring: dont touch scm_fp_list after queueing skb [5.15,201/277] SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec() [5.15,203/277] perf: arm-spe: Fix perf report --mem-mode [5.15,205/277] perf session: Remap buf if there is no space for event [5.15,208/277] scsi: ufs: ufs-pci: Add support for Intel MTL [5.15,209/277] Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning" [5.15,210/277] mmc: block: Check for errors after write on SPI [5.15,212/277] mmc: renesas_sdhi: dont overwrite TAP settings when HS400 tuning is complete [5.15,217/277] mm/mempolicy: fix mpol_new leak in shared_policy_replace [5.15,218/277] io_uring: dont check req->file in io_fsync_prep() [5.15,219/277] io_uring: defer splice/tee file validity check until command issue [5.15,220/277] io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF [5.15,226/277] btrfs: prevent subvol with swapfile from being deleted [5.15,227/277] spi: core: add dma_map_dev for __spi_unmap_msg() [5.15,228/277] arm64: patch_text: Fixup last cpu should be master [5.15,230/277] gpio: Restrict usage of GPIO chip irq members before initialization [5.15,231/277] x86/msi: Fix msi message data shadow struct [5.15,232/277] x86/mm/tlb: Revert retpoline avoidance approach [5.15,233/277] perf/x86/intel: Dont extend the pseudo-encoding to GP counters [5.15,234/277] ata: sata_dwc_460ex: Fix crash due to OOB write [5.15,236/277] perf/core: Inherit event_caps [5.15,241/277] drm/panel: ili9341: fix optional regulator handling [5.15,242/277] drm/amdgpu/display: change pipe policy for DCN 2.1 [5.15,243/277] drm/amdgpu/smu10: fix SoC/fclk units in auto mode [5.15,245/277] drm/nouveau/pmu: Add missing callbacks for Tegra devices [5.15,246/277] drm/amdkfd: Create file descriptor after client is added to smi_clients list [5.15,249/277] net/smc: send directly on setting TCP_NODELAY [5.15,250/277] Revert "selftests: net: Add tls config dependency for tls selftests" [5.15,254/277] SUNRPC: Dont call connect() more than once on a TCP socket [5.15,255/277] Revert "nbd: fix possible overflow on first_minor in nbd_dev_add()" [5.15,256/277] perf build: Dont use -ffat-lto-objects in the python feature test when building wi... [5.15,257/277] perf python: Fix probing for some clang command line options [5.15,261/277] KVM: avoid NULL pointer dereference in kvm_dirty_ring_push [5.15,262/277] Revert "net/mlx5: Accept devlink user input after driver initialization complete" [5.15,263/277] ubsan: remove CONFIG_UBSAN_OBJECT_SIZE [5.15,266/277] selftests: cgroup: Test open-time cgroup namespace usage for migration checks [5.15,267/277] mm: dont skip swap entry even if zap_details specified [5.15,274/277] irqchip/gic, gic-v3: Prevent GSI to SGI translations [5.15,276/277] static_call: Dont make __static_call_return0 static [5.15,277/277] powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit

[5.15,006/277] nbd: Fix hungtask when nbd_config_put

Commit Message

Patch