[5.9,012/152] btrfs: fix lockdep splat when reading qgroup config on mount

From: Filipe Manana <fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

commit 3d05cad3c357a2b749912914356072b38435edfa upstream.

Lockdep reported the following splat when running test btrfs/190 from
fstests:

  [ 9482.126098] ======================================================
  [ 9482.126184] WARNING: possible circular locking dependency detected
  [ 9482.126281] 5.10.0-rc4-btrfs-next-73 #1 Not tainted
  [ 9482.126365] ------------------------------------------------------
  [ 9482.126456] mount/24187 is trying to acquire lock:
  [ 9482.126534] ffffa0c869a7dac0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}, at: qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.126647]
		 but task is already holding lock:
  [ 9482.126777] ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.126886]
		 which lock already depends on the new lock.

  [ 9482.127078]
		 the existing dependency chain (in reverse order) is:
  [ 9482.127213]
		 -> #1 (btrfs-quota-00){++++}-{3:3}:
  [ 9482.127366]        lock_acquire+0xd8/0x490
  [ 9482.127436]        down_read_nested+0x45/0x220
  [ 9482.127528]        __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.127613]        btrfs_read_lock_root_node+0x41/0x130 [btrfs]
  [ 9482.127702]        btrfs_search_slot+0x514/0xc30 [btrfs]
  [ 9482.127788]        update_qgroup_status_item+0x72/0x140 [btrfs]
  [ 9482.127877]        btrfs_qgroup_rescan_worker+0xde/0x680 [btrfs]
  [ 9482.127964]        btrfs_work_helper+0xf1/0x600 [btrfs]
  [ 9482.128039]        process_one_work+0x24e/0x5e0
  [ 9482.128110]        worker_thread+0x50/0x3b0
  [ 9482.128181]        kthread+0x153/0x170
  [ 9482.128256]        ret_from_fork+0x22/0x30
  [ 9482.128327]
		 -> #0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}:
  [ 9482.128464]        check_prev_add+0x91/0xc60
  [ 9482.128551]        __lock_acquire+0x1740/0x3110
  [ 9482.128623]        lock_acquire+0xd8/0x490
  [ 9482.130029]        __mutex_lock+0xa3/0xb30
  [ 9482.130590]        qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.131577]        btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.132175]        open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.132756]        btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.133325]        legacy_get_tree+0x30/0x60
  [ 9482.133866]        vfs_get_tree+0x28/0xe0
  [ 9482.134392]        fc_mount+0xe/0x40
  [ 9482.134908]        vfs_kern_mount.part.0+0x71/0x90
  [ 9482.135428]        btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.135942]        legacy_get_tree+0x30/0x60
  [ 9482.136444]        vfs_get_tree+0x28/0xe0
  [ 9482.136949]        path_mount+0x2d7/0xa70
  [ 9482.137438]        do_mount+0x75/0x90
  [ 9482.137923]        __x64_sys_mount+0x8e/0xd0
  [ 9482.138400]        do_syscall_64+0x33/0x80
  [ 9482.138873]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.139346]
		 other info that might help us debug this:

  [ 9482.140735]  Possible unsafe locking scenario:

  [ 9482.141594]        CPU0                    CPU1
  [ 9482.142011]        ----                    ----
  [ 9482.142411]   lock(btrfs-quota-00);
  [ 9482.142806]                                lock(&fs_info->qgroup_rescan_lock);
  [ 9482.143216]                                lock(btrfs-quota-00);
  [ 9482.143629]   lock(&fs_info->qgroup_rescan_lock);
  [ 9482.144056]
		  *** DEADLOCK ***

  [ 9482.145242] 2 locks held by mount/24187:
  [ 9482.145637]  #0: ffffa0c8411c40e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xb9/0x400
  [ 9482.146061]  #1: ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.146509]
		 stack backtrace:
  [ 9482.147350] CPU: 1 PID: 24187 Comm: mount Not tainted 5.10.0-rc4-btrfs-next-73 #1
  [ 9482.147788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [ 9482.148709] Call Trace:
  [ 9482.149169]  dump_stack+0x8d/0xb5
  [ 9482.149628]  check_noncircular+0xff/0x110
  [ 9482.150090]  check_prev_add+0x91/0xc60
  [ 9482.150561]  ? kvm_clock_read+0x14/0x30
  [ 9482.151017]  ? kvm_sched_clock_read+0x5/0x10
  [ 9482.151470]  __lock_acquire+0x1740/0x3110
  [ 9482.151941]  ? __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.152402]  lock_acquire+0xd8/0x490
  [ 9482.152887]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.153354]  __mutex_lock+0xa3/0xb30
  [ 9482.153826]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154301]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154768]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155226]  qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155690]  btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.156160]  open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.156643]  btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.157108]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.157567]  ? kfree+0x31f/0x3e0
  [ 9482.158030]  legacy_get_tree+0x30/0x60
  [ 9482.158489]  vfs_get_tree+0x28/0xe0
  [ 9482.158947]  fc_mount+0xe/0x40
  [ 9482.159403]  vfs_kern_mount.part.0+0x71/0x90
  [ 9482.159875]  btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.160335]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.160805]  ? kfree+0x31f/0x3e0
  [ 9482.161260]  ? legacy_get_tree+0x30/0x60
  [ 9482.161714]  legacy_get_tree+0x30/0x60
  [ 9482.162166]  vfs_get_tree+0x28/0xe0
  [ 9482.162616]  path_mount+0x2d7/0xa70
  [ 9482.163070]  do_mount+0x75/0x90
  [ 9482.163525]  __x64_sys_mount+0x8e/0xd0
  [ 9482.163986]  do_syscall_64+0x33/0x80
  [ 9482.164437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.164902] RIP: 0033:0x7f51e907caaa

This happens because at btrfs_read_qgroup_config() we can call
qgroup_rescan_init() while holding a read lock on a quota btree leaf,
acquired by the previous call to btrfs_search_slot_for_read(), and
qgroup_rescan_init() acquires the mutex qgroup_rescan_lock.

A qgroup rescan worker does the opposite: it acquires the mutex
qgroup_rescan_lock, at btrfs_qgroup_rescan_worker(), and then tries to
update the qgroup status item in the quota btree through the call to
update_qgroup_status_item(). This inversion of locking order
between the qgroup_rescan_lock mutex and quota btree locks causes the
splat.

Fix this simply by releasing and freeing the path before calling
qgroup_rescan_init() at btrfs_read_qgroup_config().

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/qgroup.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Message ID	20201201084713.460876881@linuxfoundation.org
State	Superseded
Headers	show Return-Path: <stable-owner@kernel.org> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>, David Sterba <dsterba@suse.com> Subject: [PATCH 5.9 012/152] btrfs: fix lockdep splat when reading qgroup config on mount Date: Tue, 1 Dec 2020 09:52:07 +0100 Message-Id: <20201201084713.460876881@linuxfoundation.org> In-Reply-To: <20201201084711.707195422@linuxfoundation.org> References: <20201201084711.707195422@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	None \| expand [5.9,002/152] io_uring: order refnode recycling [5.9,003/152] spi: bcm-qspi: Fix use-after-free on unbind [5.9,004/152] spi: bcm2835: Fix use-after-free on unbind [5.9,005/152] ipv4: use IS_ENABLED instead of ifdef [5.9,006/152] IB/hfi1: Ensure correct mm is used at all times [5.9,007/152] RDMA/i40iw: Address an mmap handler exploit in i40iw [5.9,008/152] btrfs: fix missing delalloc new bit for new delalloc ranges [5.9,009/152] btrfs: tree-checker: add missing return after error in root_item [5.9,010/152] btrfs: tree-checker: add missing returns after data_ref alignment checks [5.9,011/152] btrfs: dont access possibly stale fs_info data for printing duplicate device [5.9,012/152] btrfs: fix lockdep splat when reading qgroup config on mount [5.9,013/152] rtc: pcf2127: fix a bug when not specify interrupts property [5.9,014/152] s390: fix fpu restore in entry.S [5.9,015/152] mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback) [5.9,016/152] smb3: Call cifs reconnect from demultiplex thread [5.9,017/152] smb3: Avoid Mid pending list corruption [5.9,018/152] smb3: Handle error case during offload read path [5.9,019/152] cifs: fix a memleak with modefromsid [5.9,020/152] powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=y [5.9,021/152] powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU context [5.9,022/152] KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page [5.9,023/152] KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace [5.9,024/152] KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint [5.9,025/152] KVM: x86: Fix split-irqchip vs interrupt injection window request [5.9,026/152] iommu/vt-d: Dont read VCCAP register unless it exists [5.9,027/152] firmware: xilinx: Use hash-table for api feature check [5.9,028/152] drm/amdgpu: fix SI UVD firmware validate resume fail [5.9,029/152] io_uring: fix ITER_BVEC check [5.9,030/152] trace: fix potenial dangerous pointer [5.9,031/152] arm64: tegra: Correct the UART for Jetson Xavier NX [5.9,032/152] arm64: tegra: Fix USB_VBUS_EN0 regulator on Jetson TX1 [5.9,033/152] arm64: pgtable: Fix pte_accessible() [5.9,034/152] arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect() [5.9,035/152] drm/amdgpu: fix a page fault [5.9,036/152] drm/amdgpu: update golden setting for sienna_cichlid [5.9,037/152] drm/amd/amdgpu: fix null pointer in runtime pm [5.9,038/152] drm/amd/display: Avoid HDCP initialization in devices without output [5.9,039/152] HID: uclogic: Add ID for Trust Flex Design Tablet [5.9,040/152] HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses [5.9,041/152] HID: cypress: Support Varmilo Keyboards media hotkeys [5.9,042/152] HID: add support for Sega Saturn [5.9,043/152] Input: i8042 - allow insmod to succeed on devices without an i8042 controller [5.9,044/152] HID: hid-sensor-hub: Fix issue with devices with no report ID [5.9,045/152] staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK [5.9,046/152] HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices [5.9,047/152] dmaengine: xilinx_dma: use readl_poll_timeout_atomic variant [5.9,048/152] x86/xen: dont unbind uninitialized lock_kicker_irq [5.9,049/152] kunit: fix display of failed expectations for strings [5.9,050/152] HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge [5.9,051/152] HID: Add Logitech Dinovo Edge battery quirk [5.9,052/152] proc: dont allow async path resolution of /proc/self components [5.9,053/152] nvme: free sq/cq dbbuf pointers when dbbuf set fails [5.9,054/152] io_uring: handle -EOPNOTSUPP on path resolution [5.9,055/152] net: stmmac: dwmac_lib: enlarge dma reset timeout [5.9,056/152] vdpasim: fix "mac_pton" undefined error [5.9,057/152] vhost: add helper to check if a vq has been setup [5.9,058/152] vhost scsi: alloc cmds per vq instead of session [5.9,059/152] vhost scsi: fix cmd completion race [5.9,060/152] cpuidle: tegra: Annotate tegra_pm_set_cpu_in_lp2() with RCU_NONIDLE [5.9,061/152] dmaengine: pl330: _prep_dma_memcpy: Fix wrong burst size [5.9,062/152] scsi: libiscsi: Fix NOP race condition [5.9,063/152] scsi: target: iscsi: Fix cmd abort fabric stop race [5.9,064/152] lockdep: Put graph lock/unlock under lock_recursion protection [5.9,065/152] perf/x86: fix sysfs type mismatches [5.9,066/152] xtensa: uaccess: Add missing __user to strncpy_from_user() prototype [5.9,067/152] x86/dumpstack: Do not try to access user space code of other tasks [5.9,068/152] net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset [5.9,069/152] bus: ti-sysc: Fix reset status check for modules with quirks [5.9,070/152] bus: ti-sysc: Fix bogus resetdone warning on enable for cpsw [5.9,071/152] ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled() [5.9,072/152] phy: tegra: xusb: Fix dangling pointer on probe failure [5.9,073/152] iwlwifi: mvm: use the HOT_SPOT_CMD to cancel an AUX ROC [5.9,074/152] iwlwifi: mvm: properly cancel a session protection for P2P [5.9,075/152] iwlwifi: mvm: write queue_sync_state only for sync [5.9,076/152] KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cle... [5.9,077/152] KVM: s390: remove diag318 reset code [5.9,078/152] btrfs: qgroup: dont commit transaction when we already hold the handle [5.9,079/152] batman-adv: set .owner to THIS_MODULE [5.9,080/152] usb: cdns3: gadget: fix some endian issues [5.9,081/152] usb: cdns3: gadget: calculate TD_SIZE based on TD [5.9,082/152] phy: qualcomm: usb: Fix SuperSpeed PHY OF dependency [5.9,083/152] phy: qualcomm: Fix 28 nm Hi-Speed USB PHY OF dependency [5.9,084/152] arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed [5.9,085/152] bus: ti-sysc: suppress err msg for timers used as clockevent/source [5.9,086/152] ARM: dts: dra76x: m_can: fix order of clocks [5.9,087/152] scsi: ufs: Fix race between shutdown and runtime resume flow [5.9,088/152] bnxt_en: fix error return code in bnxt_init_one() [5.9,089/152] bnxt_en: fix error return code in bnxt_init_board() [5.9,090/152] video: hyperv_fb: Fix the cache type when mapping the VRAM [5.9,091/152] bnxt_en: Release PCI regions when DMA mask setup fails during probe. [5.9,092/152] block/keyslot-manager: prevent crash when num_slots=1 [5.9,093/152] cxgb4: fix the panic caused by non smac rewrite [5.9,094/152] dpaa2-eth: select XGMAC_MDIO for MDIO bus support [5.9,095/152] s390/qeth: make af_iucv TX notification call more robust [5.9,096/152] s390/qeth: fix af_iucv notification race [5.9,097/152] s390/qeth: fix tear down of async TX buffers [5.9,098/152] drm/mediatek: dsi: Modify horizontal front/back porch byte formula [5.9,099/152] bonding: wait for sysfs kobject destruction before freeing struct slave [5.9,100/152] ibmvnic: fix call_netdevice_notifiers in do_reset [5.9,101/152] ibmvnic: notify peers when failover and migration happen [5.9,102/152] powerpc/64s: Fix allnoconfig build since uaccess flush [5.9,103/152] iommu: Check return of __iommu_attach_device() [5.9,104/152] IB/mthca: fix return value of error branch in mthca_init_cq() [5.9,105/152] i40e: Fix removing driver while bare-metal VFs pass traffic [5.9,106/152] firmware: xilinx: Fix SD DLL node reset issue [5.9,107/152] spi: imx: fix the unbalanced spi runtime pm management [5.9,108/152] io_uring: fix shift-out-of-bounds when round up cq size [5.9,109/152] aquantia: Remove the build_skb path [5.9,110/152] nfc: s3fwrn5: use signed integer for parsing GPIO numbers [5.9,111/152] net: ena: handle bad request id in ena_netdev [5.9,112/152] net: ena: set initial DMA width to avoid intel iommu issue [5.9,113/152] net: ena: fix packets addresses for rx_offset feature [5.9,114/152] ibmvnic: fix NULL pointer dereference in reset_sub_crq_queues [5.9,115/152] ibmvnic: fix NULL pointer dereference in ibmvic_reset_crq [5.9,116/152] ibmvnic: enhance resetting status check during module exit [5.9,117/152] optee: add writeback to valid memory type [5.9,118/152] x86/tboot: Dont disable swiotlb when iommu is forced on [5.9,119/152] arm64: tegra: Wrong AON HSP reg property size [5.9,120/152] efi/efivars: Set generic ops before loading SSDT [5.9,121/152] efivarfs: revert "fix memory leak in efivarfs_create()" [5.9,122/152] efi: EFI_EARLYCON should depend on EFI [5.9,123/152] riscv: Explicitly specify the build id style in vDSO Makefile again [5.9,124/152] RISC-V: Add missing jump label initialization [5.9,125/152] RISC-V: fix barrier() use in <vdso/processor.h> [5.9,126/152] net: stmmac: fix incorrect merge of patch upstream [5.9,127/152] enetc: Let the hardware auto-advance the taprio base-time of 0 [5.9,128/152] ptp: clockmatrix: bug fix for idtcm_strverscmp [5.9,129/152] drm/nouveau: fix relocations applying logic and a double-free [5.9,130/152] can: gs_usb: fix endianess problem with candleLight firmware [5.9,131/152] platform/x86: thinkpad_acpi: Send tablet mode switch at wakeup time [5.9,132/152] platform/x86: toshiba_acpi: Fix the wrong variable assignment [5.9,133/152] RDMA/hns: Fix wrong field of SRQ number the device supports [5.9,134/152] RDMA/hns: Fix retry_cnt and rnr_cnt when querying QP [5.9,135/152] RDMA/hns: Bugfix for memory window mtpt configuration [5.9,136/152] can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()s ... [5.9,137/152] can: m_can: fix nominal bitiming tseg2 min for version >= 3.1 [5.9,138/152] perf record: Synthesize cgroup events only if needed [5.9,139/152] perf stat: Use proper cpu for shadow stats [5.9,140/152] perf probe: Fix to die_entrypc() returns error correctly [5.9,141/152] spi: bcm2835aux: Restore err assignment in bcm2835aux_spi_probe [5.9,142/152] USB: core: Change %pK for __user pointers to %px [5.9,143/152] usb: gadget: f_midi: Fix memleak in f_midi_alloc [5.9,144/152] USB: core: Fix regression in Hercules audio card [5.9,145/152] USB: quirks: Add USB_QUIRK_DISCONNECT_SUSPEND quirk for Lenovo A630Z TIO built-in u... [5.9,146/152] usb: gadget: Fix memleak in gadgetfs_fill_super [5.9,147/152] irqchip/exiu: Fix the index of fwspec for IRQ type [5.9,148/152] x86/mce: Do not overwrite no_way_out if mce_end() fails [5.9,149/152] x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb [5.9,150/152] x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak [5.9,151/152] x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak [5.9,152/152] drm/amdgpu: add rlc iram and dram firmware support

[5.9,012/152] btrfs: fix lockdep splat when reading qgroup config on mount

Commit Message

Patch