[4.19,084/125] btrfs: fix space cache memory leak after transaction abort

From: Filipe Manana <fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

commit bbc37d6e475eee8ffa2156ec813efc6bbb43c06d upstream.

If a transaction aborts it can cause a memory leak of the pages array of
a block group's io_ctl structure. The following steps explain how that can
happen:

1) Transaction N is committing, currently in state TRANS_STATE_UNBLOCKED
   and it's about to start writing out dirty extent buffers;

2) Transaction N + 1 already started and another task, task A, just called
   btrfs_commit_transaction() on it;

3) Block group B was dirtied (extents allocated from it) by transaction
   N + 1, so when task A calls btrfs_start_dirty_block_groups(), at the
   very beginning of the transaction commit, it starts writeback for the
   block group's space cache by calling btrfs_write_out_cache(), which
   allocates the pages array for the block group's io_ctl with a call to
   io_ctl_init(). Block group A is added to the io_list of transaction
   N + 1 by btrfs_start_dirty_block_groups();

4) While transaction N's commit is writing out the extent buffers, it gets
   an IO error and aborts transaction N, also setting the file system to
   RO mode;

5) Task A has already returned from btrfs_start_dirty_block_groups(), is at
   btrfs_commit_transaction() and has set transaction N + 1 state to
   TRANS_STATE_COMMIT_START. Immediately after that it checks that the
   filesystem was turned to RO mode, due to transaction N's abort, and
   jumps to the "cleanup_transaction" label. After that we end up at
   btrfs_cleanup_one_transaction() which calls btrfs_cleanup_dirty_bgs().
   That helper finds block group B in the transaction's io_list but it
   never releases the pages array of the block group's io_ctl, resulting in
   a memory leak.

In fact at the point when we are at btrfs_cleanup_dirty_bgs(), the pages
array points to pages that were already released by us at
__btrfs_write_out_cache() through the call to io_ctl_drop_pages(). We end
up freeing the pages array only after waiting for the ordered extent to
complete through btrfs_wait_cache_io(), which calls io_ctl_free() to do
that. But in the transaction abort case we don't wait for the space cache's
ordered extent to complete through a call to btrfs_wait_cache_io(), so
that's why we end up with a memory leak - we wait for the ordered extent
to complete indirectly by shutting down the work queues and waiting for
any jobs in them to complete before returning from close_ctree().

We can solve the leak simply by freeing the pages array right after
releasing the pages (with the call to io_ctl_drop_pages()) at
__btrfs_write_out_cache(), since we will never use it anymore after that
and the pages array points to already released pages at that point, which
is currently not a problem since no one will use it after that, but not a
good practice anyway since it can easily lead to use-after-free issues.

So fix this by freeing the pages array right after releasing the pages at
__btrfs_write_out_cache().

This issue can often be reproduced with test case generic/475 from fstests
and kmemleak can detect it and reports it with the following trace:

unreferenced object 0xffff9bbf009fa600 (size 512):
  comm "fsstress", pid 38807, jiffies 4298504428 (age 22.028s)
  hex dump (first 32 bytes):
    00 a0 7c 4d 3d ed ff ff 40 a0 7c 4d 3d ed ff ff  ..|M=...@.|M=...
    80 a0 7c 4d 3d ed ff ff c0 a0 7c 4d 3d ed ff ff  ..|M=.....|M=...
  backtrace:
    [<00000000f4b5cfe2>] __kmalloc+0x1a8/0x3e0
    [<0000000028665e7f>] io_ctl_init+0xa7/0x120 [btrfs]
    [<00000000a1f95b2d>] __btrfs_write_out_cache+0x86/0x4a0 [btrfs]
    [<00000000207ea1b0>] btrfs_write_out_cache+0x7f/0xf0 [btrfs]
    [<00000000af21f534>] btrfs_start_dirty_block_groups+0x27b/0x580 [btrfs]
    [<00000000c3c23d44>] btrfs_commit_transaction+0xa6f/0xe70 [btrfs]
    [<000000009588930c>] create_subvol+0x581/0x9a0 [btrfs]
    [<000000009ef2fd7f>] btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
    [<00000000474e5187>] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
    [<00000000708ee349>] btrfs_ioctl_snap_create_v2+0xb0/0xf0 [btrfs]
    [<00000000ea60106f>] btrfs_ioctl+0x12c/0x3130 [btrfs]
    [<000000005c923d6d>] __x64_sys_ioctl+0x83/0xb0
    [<0000000043ace2c9>] do_syscall_64+0x33/0x80
    [<00000000904efbce>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/disk-io.c          |    1 +
 fs/btrfs/free-space-cache.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

Message ID	20200901150938.689852779@linuxfoundation.org
State	Superseded
Headers	show Return-Path: <SRS0=oSvm=CK=vger.kernel.org=stable-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4807FC28E83 for <stable@archiver.kernel.org>; Tue, 1 Sep 2020 16:54:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1A24E20767 for <stable@archiver.kernel.org>; Tue, 1 Sep 2020 16:54:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598979260; bh=6K4rNcG4L2WAZUDR1nnhoCoAQdWt0wxjfaOUf2NyQQQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=Lok1xvRqbZay7ID7r8C1JVWuNhFgMHunUA9oqkhYbvymLCw3tnZUM8oy4Zwzn0EEY CEagyHZElnGIsgZByhOAw+0PeOokkQLvL5MA/efC7UPzTMxvJ7dqsnhSUKzPupQh+2 l6/1kTPDyUwXXMIUaLPgaI0mfk9QnkqaUU5tUXak= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732480AbgIAQyD (ORCPT <rfc822;stable@archiver.kernel.org>); Tue, 1 Sep 2020 12:54:03 -0400 Received: from mail.kernel.org ([198.145.29.99]:48782 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729976AbgIAPYm (ORCPT <rfc822;stable@vger.kernel.org>); Tue, 1 Sep 2020 11:24:42 -0400 Received: from localhost (83-86-74-64.cable.dynamic.v4.ziggo.nl [83.86.74.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 84A622078B; Tue, 1 Sep 2020 15:24:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598973882; bh=6K4rNcG4L2WAZUDR1nnhoCoAQdWt0wxjfaOUf2NyQQQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jiYuzm16E1Ak850bjpg28C4TrYLX6MBMXgTRHnIzsBPAHz0l1tJoJKC8OeSXTO7ze BCowVfuUCHqmi+9CBj1ZJuBn8VxBDxtc6O7ZCG1XjP0ndrNPdUujQFf4EByL7oCl11 jKJLdcGUVKfnrWye5Odgt5vUuYhIh9fqzG0AYqrc= From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Josef Bacik <josef@toxicpanda.com>, Filipe Manana <fdmanana@suse.com>, David Sterba <dsterba@suse.com> Subject: [PATCH 4.19 084/125] btrfs: fix space cache memory leak after transaction abort Date: Tue, 1 Sep 2020 17:10:39 +0200 Message-Id: <20200901150938.689852779@linuxfoundation.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200901150934.576210879@linuxfoundation.org> References: <20200901150934.576210879@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: <stable.vger.kernel.org> X-Mailing-List: stable@vger.kernel.org
Series	None \| expand [4.19,002/125] gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY [4.19,003/125] net: Fix potential wrong skb->protocol in skb_vlan_untag() [4.19,004/125] net: qrtr: fix usage of idr in port assignment to socket [4.19,005/125] net/smc: Prevent kernel-infoleak in __smc_diag_dump() [4.19,006/125] tipc: fix uninit skb->data in tipc_nl_compat_dumpit() [4.19,007/125] net: ena: Make missed_tx stat incremental [4.19,008/125] ipvlan: fix device features [4.19,009/125] ALSA: pci: delete repeated words in comments [4.19,010/125] ASoC: img: Fix a reference count leak in img_i2s_in_set_fmt [4.19,011/125] ASoC: img-parallel-out: Fix a reference count leak [4.19,012/125] ASoC: tegra: Fix reference count leaks. [4.19,013/125] mfd: intel-lpss: Add Intel Emmitsburg PCH PCI IDs [4.19,014/125] arm64: dts: qcom: msm8916: Pull down PDM GPIOs during sleep [4.19,015/125] powerpc/xive: Ignore kmemleak false positives [4.19,016/125] media: pci: ttpci: av7110: fix possible buffer overflow caused by bad DMA value in... [4.19,017/125] blktrace: ensure our debugfs dir exists [4.19,018/125] scsi: target: tcmu: Fix crash on ARM during cmd completion [4.19,019/125] iommu/iova: Dont BUG on invalid PFNs [4.19,020/125] drm/amdkfd: Fix reference count leaks. [4.19,021/125] drm/radeon: fix multiple reference count leak [4.19,022/125] drm/amdgpu: fix ref count leak in amdgpu_driver_open_kms [4.19,023/125] drm/amd/display: fix ref count leak in amdgpu_drm_ioctl [4.19,024/125] drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config [4.19,025/125] drm/amdgpu/display: fix ref count leak when pm_runtime_get_sync fails [4.19,026/125] scsi: lpfc: Fix shost refcount mismatch when deleting vport [4.19,027/125] xfs: Dont allow logging of XFS_ISTALE inodes [4.19,028/125] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests [4.19,029/125] f2fs: fix error path in do_recover_data() [4.19,030/125] omapfb: fix multiple reference count leaks due to pm_runtime_get_sync [4.19,031/125] PCI: Fix pci_create_slot() reference count leak [4.19,032/125] ARM: dts: ls1021a: output PPS signal on FIPER2 [4.19,033/125] rtlwifi: rtl8192cu: Prevent leaking urb [4.19,034/125] mips/vdso: Fix resource leaks in genvdso.c [4.19,035/125] cec-api: prevent leaking memory through hole in structure [4.19,036/125] HID: quirks: add NOGET quirk for Logitech GROUP [4.19,037/125] f2fs: fix use-after-free issue [4.19,038/125] drm/nouveau/drm/noveau: fix reference count leak in nouveau_fbcon_open [4.19,039/125] drm/nouveau: fix reference count leak in nv50_disp_atomic_commit [4.19,040/125] drm/nouveau: Fix reference count leak in nouveau_connector_detect [4.19,041/125] locking/lockdep: Fix overflow in presentation of average lock-time [4.19,042/125] btrfs: file: reserve qgroup space after the hole punch range is locked [4.19,043/125] scsi: iscsi: Do not put host in iscsi_set_flashnode_param() [4.19,044/125] ceph: fix potential mdsc use-after-free crash [4.19,045/125] scsi: fcoe: Memory leak fix in fcoe_sysfs_fcf_del() [4.19,046/125] EDAC/ie31200: Fallback if host bridge device is already initialized [4.19,047/125] media: davinci: vpif_capture: fix potential double free [4.19,048/125] KVM: arm64: Fix symbol dependency in __hyp_call_panic_nvhe [4.19,049/125] powerpc/spufs: add CONFIG_COREDUMP dependency [4.19,050/125] USB: sisusbvga: Fix a potential UB casued by left shifting a negative value [4.19,051/125] efi: provide empty efi_enter_virtual_mode implementation [4.19,052/125] Revert "ath10k: fix DMA related firmware crashes on multiple devices" [4.19,053/125] media: gpio-ir-tx: improve precision of transmitted signal due to scheduling [4.19,054/125] drm/msm/adreno: fix updating ring fence [4.19,055/125] nvme-fc: Fix wrong return value in __nvme_fc_init_request() [4.19,056/125] null_blk: fix passing of REQ_FUA flag in null_handle_rq [4.19,057/125] i2c: rcar: in slave mode, clear NACK earlier [4.19,058/125] usb: gadget: f_tcm: Fix some resource leaks in some error paths [4.19,059/125] jbd2: make sure jh have b_transaction set in refile/unfile_buffer [4.19,060/125] ext4: dont BUG on inconsistent journal feature [4.19,061/125] ext4: handle read only external journal device [4.19,062/125] jbd2: abort journal if free a async write error metadata buffer [4.19,063/125] ext4: handle option set by mount flags correctly [4.19,064/125] ext4: handle error of ext4_setup_system_zone() on remount [4.19,065/125] ext4: correctly restore system zone info when remount fails [4.19,066/125] fs: prevent BUG_ON in submit_bh_wbc() [4.19,067/125] spi: stm32: fix stm32_spi_prepare_mbr in case of odd clk_rate [4.19,068/125] s390/cio: add cond_resched() in the slow_eval_known_fn() loop [4.19,069/125] ASoC: wm8994: Avoid attempts to read unreadable registers [4.19,070/125] scsi: fcoe: Fix I/O path allocation [4.19,071/125] scsi: ufs: Fix possible infinite loop in ufshcd_hold [4.19,072/125] scsi: ufs: Improve interrupt handling for shared interrupts [4.19,073/125] scsi: ufs: Clean up completed request without interrupt notification [4.19,074/125] scsi: qla2xxx: Check if FW supports MQ before enabling [4.19,075/125] scsi: qla2xxx: Fix null pointer access during disconnect from subsystem [4.19,076/125] Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command" [4.19,077/125] macvlan: validate setting of multiple remote source MAC addresses [4.19,078/125] net: gianfar: Add of_node_put() before goto statement [4.19,079/125] powerpc/perf: Fix soft lockups due to missed interrupt accounting [4.19,080/125] block: loop: set discard granularity and alignment for block device backed loop [4.19,081/125] HID: i2c-hid: Always sleep 60ms after I2C_HID_PWR_ON commands [4.19,082/125] blk-mq: order adding requests to hctx->dispatch and checking SCHED_RESTART [4.19,083/125] btrfs: reset compression level for lzo on remount [4.19,084/125] btrfs: fix space cache memory leak after transaction abort [4.19,085/125] fbcon: prevent user font height or width change from causing potential out-of-boun... [4.19,086/125] USB: lvtest: return proper error code in probe [4.19,087/125] vt: defer kfree() of vc_screenbuf in vc_do_resize() [4.19,088/125] vt_ioctl: change VT_RESIZEX ioctl to check for error return from vc_resize() [4.19,089/125] serial: samsung: Removes the IRQ not found warning [4.19,090/125] serial: pl011: Fix oops on -EPROBE_DEFER [4.19,091/125] serial: pl011: Dont leak amba_ports entry on driver register error [4.19,092/125] serial: 8250_exar: Fix number of ports for Commtech PCIe cards [4.19,093/125] serial: 8250: change lock order in serial8250_do_startup() [4.19,094/125] writeback: Protect inode->i_io_list with inode->i_lock [4.19,095/125] writeback: Avoid skipping inode writeback [4.19,096/125] writeback: Fix sync livelock due to b_dirty_time processing [4.19,097/125] XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt XEN data ... [4.19,098/125] usb: host: xhci: fix ep context print mismatch in debugfs [4.19,099/125] xhci: Do warm-reset when both CAS and XDEV_RESUME are set [4.19,100/125] xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed [4.19,101/125] PM: sleep: core: Fix the handling of pending runtime resume requests [4.19,102/125] device property: Fix the secondary firmware node handling in set_primary_fwnode() [4.19,103/125] genirq/matrix: Deal with the sillyness of for_each_cpu() on UP [4.19,104/125] irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake [4.19,105/125] drm/amdgpu: Fix buffer overflow in INFO ioctl [4.19,106/125] drm/amd/pm: correct Vega10 swctf limit setting [4.19,107/125] drm/amd/pm: correct Vega12 swctf limit setting [4.19,108/125] USB: yurex: Fix bad gfp argument [4.19,109/125] usb: uas: Add quirk for PNY Pro Elite [4.19,110/125] USB: quirks: Add no-lpm quirk for another Raydium touchscreen [4.19,111/125] USB: quirks: Ignore duplicate endpoint on Sound Devices MixPre-D [4.19,112/125] USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge [4.19,113/125] usb: host: ohci-exynos: Fix error handling in exynos_ohci_probe() [4.19,114/125] USB: gadget: u_f: add overflow checks to VLA macros [4.19,115/125] USB: gadget: f_ncm: add bounds checks to ncm_unwrap_ntb() [4.19,116/125] USB: gadget: u_f: Unbreak offset calculation in VLAs [4.19,117/125] USB: cdc-acm: rework notification_buffer resizing [4.19,118/125] usb: storage: Add unusual_uas entry for Sony PSZ drives [4.19,119/125] btrfs: check the right error variable in btrfs_del_dir_entries_in_log [4.19,120/125] usb: dwc3: gadget: Dont setup more than requested [4.19,121/125] usb: dwc3: gadget: Fix handling ZLP [4.19,122/125] usb: dwc3: gadget: Handle ZLP for sg requests [4.19,123/125] tpm: Unify the mismatching TPM space buffer sizes [4.19,124/125] HID: hiddev: Fix slab-out-of-bounds write in hiddev_ioctl_usage() [4.19,125/125] ALSA: usb-audio: Update documentation comment for MS2109 quirk

[4.19,084/125] btrfs: fix space cache memory leak after transaction abort

Commit Message

Patch