[5.12,038/127] btrfs: zoned: fix parallel compressed writes

Message ID	20210524152336.139215075@linuxfoundation.org
State	New
Headers	show Return-Path: <stable-owner@kernel.org> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, David Sterba <dsterba@suse.com>, Johannes Thumshirn <johannes.thumshirn@wdc.com> Subject: [PATCH 5.12 038/127] btrfs: zoned: fix parallel compressed writes Date: Mon, 24 May 2021 17:25:55 +0200 Message-Id: <20210524152336.139215075@linuxfoundation.org> In-Reply-To: <20210524152334.857620285@linuxfoundation.org> References: <20210524152334.857620285@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	None \| expand [5.12,002/127] openrisc: Fix a memory leak [5.12,003/127] tee: amdtee: unload TA only when its refcount becomes 0 [5.12,004/127] habanalabs/gaudi: Fix a potential use after free in gaudi_memset_device_memory [5.12,005/127] RDMA/siw: Properly check send and receive CQ pointers [5.12,006/127] RDMA/siw: Release xarray entry [5.12,007/127] RDMA/core: Prevent divide-by-zero error triggered by the user [5.12,008/127] platform/x86: ideapad-laptop: fix a NULL pointer dereference [5.12,009/127] RDMA/rxe: Clear all QP fields if creation failed [5.12,010/127] scsi: ufs: core: Increase the usable queue depth [5.12,011/127] scsi: qedf: Add pointer checks in qedf_update_link_speed() [5.12,012/127] scsi: qla2xxx: Fix error return code in qla82xx_write_flash_dword() [5.12,013/127] RDMA/mlx5: Recover from fatal event in dual port mode [5.12,014/127] RDMA/rxe: Split MEM into MR and MW [5.12,015/127] RDMA/rxe: Return CQE error if invalid lkey was supplied [5.12,016/127] RDMA/core: Dont access cm_id after its destruction [5.12,017/127] nvmet: fix memory leak in nvmet_alloc_ctrl() [5.12,018/127] nvme-loop: fix memory leak in nvme_loop_create_ctrl() [5.12,019/127] nvme-tcp: rerun io_work if req_list is not empty [5.12,020/127] nvme-fc: clear q_live at beginning of association teardown [5.12,021/127] platform/mellanox: mlxbf-tmfifo: Fix a memory barrier issue [5.12,022/127] platform/x86: intel_int0002_vgpio: Only call enable_irq_wake() when using s2idle [5.12,023/127] platform/x86: dell-smbios-wmi: Fix oops on rmmod dell_smbios [5.12,024/127] RDMA/mlx5: Fix query DCT via DEVX [5.12,025/127] RDMA/uverbs: Fix a NULL vs IS_ERR() bug [5.12,026/127] tools/testing/selftests/exec: fix link error [5.12,027/127] drm/ttm: Do not add non-system domain BO into swap list [5.12,028/127] powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocks [5.12,029/127] ptrace: make ptrace() fail if the tracee changed its pid unexpectedly [5.12,030/127] nvmet: seset ns->file when open fails [5.12,031/127] perf/x86: Avoid touching LBR_TOS MSR for Arch LBR [5.12,032/127] locking/lockdep: Correct calling tracepoints [5.12,033/127] locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal [5.12,034/127] powerpc: Fix early setup to make early_ioremap() work [5.12,035/127] btrfs: avoid RCU stalls while running delayed iputs [5.12,036/127] btrfs: fix removed dentries still existing after log is synced [5.12,037/127] btrfs: zoned: pass start block to btrfs_use_zone_append [5.12,038/127] btrfs: zoned: fix parallel compressed writes [5.12,039/127] cifs: fix memory leak in smb2_copychunk_range [5.12,040/127] fs/mount_setattr: tighten permission checks [5.12,041/127] misc: eeprom: at24: check suspend status before disable regulator [5.12,042/127] ALSA: dice: fix stream format for TC Electronic Konnekt Live at high sampling tran... [5.12,043/127] ALSA: intel8x0: Dont update period unless prepared [5.12,044/127] ALSA: firewire-lib: fix amdtp_packet tracepoints event for packet_index field [5.12,045/127] ALSA: line6: Fix racy initialization of LINE6 MIDI [5.12,046/127] ALSA: dice: fix stream format at middle sampling rate for Alesis iO 26 [5.12,047/127] ALSA: firewire-lib: fix calculation for size of IR context payload [5.12,048/127] ALSA: usb-audio: Validate MS endpoint descriptors [5.12,049/127] ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro [5.12,050/127] ALSA: hda: fixup headset for ASUS GU502 laptop [5.12,051/127] Revert "ALSA: sb8: add a check for request_region" [5.12,052/127] ALSA: firewire-lib: fix check for the size of isochronous packet payload [5.12,053/127] ALSA: hda/realtek: reset eapd coeff to default value for alc287 [5.12,054/127] ALSA: hda/realtek: Add some CLOVE SSIDs of ALC293 [5.12,055/127] ALSA: hda/realtek: Fix silent headphone output on ASUS UX430UA [5.12,056/127] ALSA: hda/realtek: Add fixup for HP OMEN laptop [5.12,057/127] ALSA: hda/realtek: Add fixup for HP Spectre x360 15-df0xxx [5.12,058/127] ALSA: usb-audio: Configure Pioneer DJM-850 samplerate [5.12,059/127] ALSA: usb-audio: DJM-750: ensure format is set [5.12,060/127] uio/uio_pci_generic: fix return value changed in refactoring [5.12,061/127] uio_hv_generic: Fix a memory leak in error handling paths [5.12,062/127] uio_hv_generic: Fix another memory leak in error handling paths [5.12,063/127] platform/x86: ideapad-laptop: fix method name typo [5.12,064/127] Revert "rapidio: fix a NULL pointer dereference when create_workqueue() fails" [5.12,065/127] rapidio: handle create_workqueue() failure [5.12,066/127] Revert "serial: mvebu-uart: Fix to avoid a potential NULL pointer dereference" [5.12,067/127] nvme-tcp: fix possible use-after-completion [5.12,068/127] x86/build: Fix location of -plugin-opt= flags [5.12,069/127] x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch [5.12,070/127] x86/sev-es: Invalidate the GHCB after completing VMGEXIT [5.12,071/127] x86/sev-es: Dont return NULL from sev_es_get_ghcb() [5.12,072/127] x86/sev-es: Use __put_user()/__get_user() for data accesses [5.12,073/127] x86/sev-es: Forward page-faults which happen during emulation [5.12,074/127] drm/i915/gem: Pin the L-shape quirked object as unshrinkable [5.12,075/127] drm/amd/display: Use the correct max downscaling value for DCN3.x family [5.12,076/127] drm/radeon: use the dummy page for GART if needed [5.12,077/127] drm/amdgpu: Fix GPU TLB update error when PAGE_SIZE > AMDGPU_PAGE_SIZE [5.12,078/127] drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang [5.12,079/127] drm/amdgpu: update gc golden setting for Navi12 [5.12,080/127] drm/amdgpu: update sdma golden setting for Navi12 [5.12,081/127] dma-buf: fix unintended pin/unpin warnings [5.12,082/127] powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference betwee... [5.12,083/127] powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls [5.12,084/127] mmc: sdhci-pci-gli: increase 1.8V regulator wait [5.12,085/127] mmc: meson-gx: make replace WARN_ONCE with dev_warn_once about scatterlist offset ... [5.12,086/127] mmc: meson-gx: also check SD_IO_RW_EXTENDED for scatterlist size alignment [5.12,087/127] gpio: tegra186: Dont set parent IRQ affinity [5.12,088/127] xen-pciback: redo VF placement in the virtual topology [5.12,089/127] xen-pciback: reconfigure also from backend watch handler [5.12,090/127] ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry [5.12,091/127] userfaultfd: hugetlbfs: fix new flag usage in error path [5.12,092/127] Revert "mm/gup: check page posion status for coredump." [5.12,093/127] dm snapshot: fix a crash when an origin has no snapshots [5.12,094/127] dm snapshot: fix crash with transient storage and zero chunk size [5.12,095/127] kcsan: Fix debugfs initcall return type [5.12,096/127] Revert "video: hgafb: fix potential NULL pointer dereference" [5.12,097/127] Revert "net: stmicro: fix a missing check of clk_prepare" [5.12,098/127] Revert "leds: lp5523: fix a missing check of return value of lp55xx_read" [5.12,099/127] Revert "hwmon: (lm80) fix a missing check of bus read in lm80 probe" [5.12,100/127] Revert "video: imsttfb: fix potential NULL pointer dereferences" [5.12,101/127] Revert "ecryptfs: replace BUG_ON with error handling code" [5.12,102/127] Revert "scsi: ufs: fix a missing check of devm_reset_control_get" [5.12,103/127] Revert "gdrom: fix a memory leak bug" [5.12,104/127] cdrom: gdrom: deallocate struct gdrom_unit fields in remove_gdrom [5.12,105/127] cdrom: gdrom: initialize global variable at init time [5.12,106/127] Revert "media: rcar_drif: fix a memory disclosure" [5.12,107/127] Revert "rtlwifi: fix a potential NULL pointer dereference" [5.12,108/127] Revert "qlcnic: Avoid potential NULL pointer dereference" [5.12,109/127] Revert "niu: fix missing checks of niu_pci_eeprom_read" [5.12,110/127] ethernet: sun: niu: fix missing checks of niu_pci_eeprom_read() [5.12,111/127] net: stmicro: handle clk_prepare() failure during init [5.12,112/127] scsi: ufs: handle cleanup correctly on devm_reset_control_get error [5.12,113/127] net: rtlwifi: properly check for alloc_workqueue() failure [5.12,114/127] ics932s401: fix broken handling of errors when word reading fails [5.12,115/127] leds: lp5523: check return value of lp5xx_read and jump to cleanup code [5.12,116/127] qlcnic: Add null check after calling netdev_alloc_skb [5.12,117/127] video: hgafb: fix potential NULL pointer dereference [5.12,118/127] vgacon: Record video mode changes with VT_RESIZEX [5.12,119/127] vt_ioctl: Revert VT_RESIZEX parameter handling removal [5.12,120/127] vt: Fix character height handling with VT_RESIZEX [5.12,121/127] tty: vt: always invoke vc->vc_sw->con_resize callback [5.12,122/127] drm/i915/gt: Disable HiZ Raw Stall Optimization on broken gen7 [5.12,123/127] openrisc: mm/init.c: remove unused memblock_region variable in map_ram() [5.12,124/127] x86/Xen: swap NX determination and GDT setup on BSP [5.12,125/127] nvme-multipath: fix double initialization of ANA state [5.12,126/127] rtc: pcf85063: fallback to parent of_node [5.12,127/127] x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path

Message ID

20210524152336.139215075@linuxfoundation.org

State

New

Headers

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, David Sterba <dsterba@suse.com>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: [PATCH 5.12 038/127] btrfs: zoned: fix parallel compressed writes
Date: Mon, 24 May 2021 17:25:55 +0200
Message-Id: <20210524152336.139215075@linuxfoundation.org>
In-Reply-To: <20210524152334.857620285@linuxfoundation.org>
References: <20210524152334.857620285@linuxfoundation.org>
User-Agent: quilt/0.66
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

None | expand

Commit Message

Greg KH May 24, 2021, 3:25 p.m. UTC

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

commit 764c7c9a464b68f7c6a5a9ec0b923176a05e8e8f upstream.

When multiple processes write data to the same block group on a
compressed zoned filesystem, the underlying device could report I/O
errors and data corruption is possible.

This happens because on a zoned file system, compressed data writes
where sent to the device via a REQ_OP_WRITE instead of a
REQ_OP_ZONE_APPEND operation. But with REQ_OP_WRITE and parallel
submission it cannot be guaranteed that the data is always submitted
aligned to the underlying zone's write pointer.

The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a
zoned filesystem is non intrusive on a regular file system or when
submitting to a conventional zone on a zoned filesystem, as it is
guarded by btrfs_use_zone_append.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 9d294a685fbc ("btrfs: zoned: enable to mount ZONED incompat flag")
CC: stable@vger.kernel.org # 5.12.x: e380adfc213a13: btrfs: zoned: pass start block to btrfs_use_zone_append
CC: stable@vger.kernel.org # 5.12.x
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/compression.c |   42 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

Comments

David Sterba May 25, 2021, noon UTC | #1

On Mon, May 24, 2021 at 05:25:55PM +0200, Greg Kroah-Hartman wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

> 

> commit 764c7c9a464b68f7c6a5a9ec0b923176a05e8e8f upstream.

> 

> When multiple processes write data to the same block group on a

> compressed zoned filesystem, the underlying device could report I/O

> errors and data corruption is possible.

> 

> This happens because on a zoned file system, compressed data writes

> where sent to the device via a REQ_OP_WRITE instead of a

> REQ_OP_ZONE_APPEND operation. But with REQ_OP_WRITE and parallel

> submission it cannot be guaranteed that the data is always submitted

> aligned to the underlying zone's write pointer.

> 

> The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a

> zoned filesystem is non intrusive on a regular file system or when

> submitting to a conventional zone on a zoned filesystem, as it is

> guarded by btrfs_use_zone_append.

> 

> Reported-by: David Sterba <dsterba@suse.com>

> Fixes: 9d294a685fbc ("btrfs: zoned: enable to mount ZONED incompat flag")

> CC: stable@vger.kernel.org # 5.12.x: e380adfc213a13: btrfs: zoned: pass start block to btrfs_use_zone_append

> CC: stable@vger.kernel.org # 5.12.x

> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

> Signed-off-by: David Sterba <dsterba@suse.com>

> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


We found a bug in this patch, please drop it from 5.12 queue.

Greg KH May 25, 2021, 12:20 p.m. UTC | #2

On Tue, May 25, 2021 at 02:00:54PM +0200, David Sterba wrote:
> On Mon, May 24, 2021 at 05:25:55PM +0200, Greg Kroah-Hartman wrote:

> > From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

> > 

> > commit 764c7c9a464b68f7c6a5a9ec0b923176a05e8e8f upstream.

> > 

> > When multiple processes write data to the same block group on a

> > compressed zoned filesystem, the underlying device could report I/O

> > errors and data corruption is possible.

> > 

> > This happens because on a zoned file system, compressed data writes

> > where sent to the device via a REQ_OP_WRITE instead of a

> > REQ_OP_ZONE_APPEND operation. But with REQ_OP_WRITE and parallel

> > submission it cannot be guaranteed that the data is always submitted

> > aligned to the underlying zone's write pointer.

> > 

> > The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a

> > zoned filesystem is non intrusive on a regular file system or when

> > submitting to a conventional zone on a zoned filesystem, as it is

> > guarded by btrfs_use_zone_append.

> > 

> > Reported-by: David Sterba <dsterba@suse.com>

> > Fixes: 9d294a685fbc ("btrfs: zoned: enable to mount ZONED incompat flag")

> > CC: stable@vger.kernel.org # 5.12.x: e380adfc213a13: btrfs: zoned: pass start block to btrfs_use_zone_append

> > CC: stable@vger.kernel.org # 5.12.x

> > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

> > Signed-off-by: David Sterba <dsterba@suse.com>

> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

> 

> We found a bug in this patch, please drop it from 5.12 queue.


This one, and the previous one, now dropped.

thanks,

greg k-h

--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -28,6 +28,7 @@ 
 #include "compression.h"
 #include "extent_io.h"
 #include "extent_map.h"
+#include "zoned.h"
 
 static const char* const btrfs_compress_types[] = { "", "zlib", "lzo", "zstd" };
 
@@ -349,6 +350,7 @@  static void end_compressed_bio_write(str
 	 */
 	inode = cb->inode;
 	cb->compressed_pages[0]->mapping = cb->inode->i_mapping;
+	btrfs_record_physical_zoned(inode, cb->start, bio);
 	btrfs_writepage_endio_finish_ordered(cb->compressed_pages[0],
 			cb->start, cb->start + cb->len - 1,
 			bio->bi_status == BLK_STS_OK);
@@ -401,6 +403,8 @@  blk_status_t btrfs_submit_compressed_wri
 	u64 first_byte = disk_start;
 	blk_status_t ret;
 	int skip_sum = inode->flags & BTRFS_INODE_NODATASUM;
+	const bool use_append = btrfs_use_zone_append(inode, disk_start);
+	const unsigned int bio_op = use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE;
 
 	WARN_ON(!PAGE_ALIGNED(start));
 	cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS);
@@ -418,10 +422,31 @@  blk_status_t btrfs_submit_compressed_wri
 	cb->nr_pages = nr_pages;
 
 	bio = btrfs_bio_alloc(first_byte);
-	bio->bi_opf = REQ_OP_WRITE | write_flags;
+	bio->bi_opf = bio_op | write_flags;
 	bio->bi_private = cb;
 	bio->bi_end_io = end_compressed_bio_write;
 
+	if (use_append) {
+		struct extent_map *em;
+		struct map_lookup *map;
+		struct block_device *bdev;
+
+		em = btrfs_get_chunk_map(fs_info, disk_start, PAGE_SIZE);
+		if (IS_ERR(em)) {
+			kfree(cb);
+			bio_put(bio);
+			return BLK_STS_NOTSUPP;
+		}
+
+		map = em->map_lookup;
+		/* We only support single profile for now */
+		ASSERT(map->num_stripes == 1);
+		bdev = map->stripes[0].dev->bdev;
+
+		bio_set_dev(bio, bdev);
+		free_extent_map(em);
+	}
+
 	if (blkcg_css) {
 		bio->bi_opf |= REQ_CGROUP_PUNT;
 		kthread_associate_blkcg(blkcg_css);
@@ -432,6 +457,7 @@  blk_status_t btrfs_submit_compressed_wri
 	bytes_left = compressed_len;
 	for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) {
 		int submit = 0;
+		int len;
 
 		page = compressed_pages[pg_index];
 		page->mapping = inode->vfs_inode.i_mapping;
@@ -439,9 +465,13 @@  blk_status_t btrfs_submit_compressed_wri
 			submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio,
 							  0);
 
+		if (pg_index == 0 && use_append)
+			len = bio_add_zone_append_page(bio, page, PAGE_SIZE, 0);
+		else
+			len = bio_add_page(bio, page, PAGE_SIZE, 0);
+
 		page->mapping = NULL;
-		if (submit || bio_add_page(bio, page, PAGE_SIZE, 0) <
-		    PAGE_SIZE) {
+		if (submit || len < PAGE_SIZE) {
 			/*
 			 * inc the count before we submit the bio so
 			 * we know the end IO handler won't happen before
@@ -465,11 +495,15 @@  blk_status_t btrfs_submit_compressed_wri
 			}
 
 			bio = btrfs_bio_alloc(first_byte);
-			bio->bi_opf = REQ_OP_WRITE | write_flags;
+			bio->bi_opf = bio_op | write_flags;
 			bio->bi_private = cb;
 			bio->bi_end_io = end_compressed_bio_write;
 			if (blkcg_css)
 				bio->bi_opf |= REQ_CGROUP_PUNT;
+			/*
+			 * Use bio_add_page() to ensure the bio has at least one
+			 * page.
+			 */
 			bio_add_page(bio, page, PAGE_SIZE, 0);
 		}
 		if (bytes_left < PAGE_SIZE) {

[5.12,038/127] btrfs: zoned: fix parallel compressed writes

Commit Message

Comments

Patch