diff mbox series

xfs: trim IO to found COW exent limit

Message ID e7fe7225-4f2b-d13e-bb4b-c7db68f63124@redhat.com
State New
Headers show
Series xfs: trim IO to found COW exent limit | expand

Commit Message

Eric Sandeen Sept. 23, 2020, 10:35 p.m. UTC
A bug existed in the XFS reflink code between v5.1 and v5.5 in which
the mapping for a COW IO was not trimmed to the mapping of the COW
extent that was found.  This resulted in a too-short copy, and
corruption of other files which shared the original extent.

(This happened only when extent size hints were set, which bypasses
delalloc and led to this code path.)

This was (inadvertently) fixed upstream with

36adcbace24e "xfs: fill out the srcmap in iomap_begin"

and related patches which moved lots of this functionality to
the iomap subsystem.

Hence, this is a -stable only patch, targeted to fix this
corruption vector without other major code changes.

Fixes: 78f0cc9d55cb ("xfs: don't use delalloc extents for COW on files with extsize hints")
Cc: <stable@vger.kernel.org> # 5.4.x
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

I've tested this with a targeted reproducer (in next email) as well as
with xfstests.

Stable folk, not sure how to send a "stable only" patch, or if that's even
valid.  Assuming you're willing to accept it, I would still like to have
some formal Reviewed-by's from the xfs developer community before it gets
merged.

Big thanks to Darrick & Dave for letting me whine about this bug and
offering suggestions for testing and ultimately, a patch to test.

Comments

Eric Sandeen Sept. 23, 2020, 10:37 p.m. UTC | #1
Here's a reproducer for the bug.

Requires sufficient privs to mount a loopback fs.

======

#!/bin/bash

mkdir -p mnt
umount mnt &>/dev/null

# Create 8g fs image
mkfs.xfs -f -dfile,name=fsfile.img,size=8g &>/dev/null
mount -o loop fsfile.img mnt

# Make all files w/ 1m hints; create original 2m file
xfs_io -c "extsize 1048576" mnt
xfs_io -c "cowextsize 1048576" mnt

echo "Create file mnt/b"
xfs_io -f -c "pwrite -S 0x0 0 2m" -c fsync mnt/b &>/dev/null

# Make a reflinked copy
echo "Reflink copy from mnt/b to mnt/a"
cp --reflink=always mnt/b mnt/a

echo "Contents of mnt/b"
hexdump -C mnt/b

# Cycle mount to get stuff out of cache
umount mnt
mount -o loop fsfile.img mnt

# Create a 1m-hinted IO at offset 0, then
# do another IO that overlaps but extends past the 1m hint
echo "Write to mnt/a"
xfs_io -c "pwrite -S 0xa 0k -b 4k 4k" \
       -c "pwrite -S 0xa 4k -b 1m 1m" \
       mnt/a &>/dev/null

xfs_io -c fsync mnt/a

echo "Contents of mnt/b now:"
hexdump -C mnt/b

umount mnt
Darrick J. Wong Sept. 23, 2020, 10:58 p.m. UTC | #2
On Wed, Sep 23, 2020 at 05:35:44PM -0500, Eric Sandeen wrote:
> A bug existed in the XFS reflink code between v5.1 and v5.5 in which

> the mapping for a COW IO was not trimmed to the mapping of the COW

> extent that was found.  This resulted in a too-short copy, and

> corruption of other files which shared the original extent.

> 

> (This happened only when extent size hints were set, which bypasses

> delalloc and led to this code path.)

> 

> This was (inadvertently) fixed upstream with

> 

> 36adcbace24e "xfs: fill out the srcmap in iomap_begin"

> 

> and related patches which moved lots of this functionality to

> the iomap subsystem.

> 

> Hence, this is a -stable only patch, targeted to fix this

> corruption vector without other major code changes.

> 

> Fixes: 78f0cc9d55cb ("xfs: don't use delalloc extents for COW on files with extsize hints")

> Cc: <stable@vger.kernel.org> # 5.4.x

> Signed-off-by: Eric Sandeen <sandeen@redhat.com>


Looks good to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>


(Note: Someone please fix the typo "exent" in the subject line.)

--D

> ---

> 

> I've tested this with a targeted reproducer (in next email) as well as

> with xfstests.

> 

> Stable folk, not sure how to send a "stable only" patch, or if that's even

> valid.  Assuming you're willing to accept it, I would still like to have

> some formal Reviewed-by's from the xfs developer community before it gets

> merged.

> 

> Big thanks to Darrick & Dave for letting me whine about this bug and

> offering suggestions for testing and ultimately, a patch to test.

> 

> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c

> index 06b9e0aacf54..3289d0f4bb03 100644

> --- a/fs/xfs/xfs_iomap.c

> +++ b/fs/xfs/xfs_iomap.c

> @@ -1002,9 +1002,15 @@ xfs_file_iomap_begin(

>  		 * I/O, which must be block aligned, we need to report the

>  		 * newly allocated address.  If the data fork has a hole, copy

>  		 * the COW fork mapping to avoid allocating to the data fork.

> +		 *

> +		 * Otherwise, ensure that the imap range does not extend past

> +		 * the range allocated/found in cmap.

>  		 */

>  		if (directio || imap.br_startblock == HOLESTARTBLOCK)

>  			imap = cmap;

> +		else

> +			xfs_trim_extent(&imap, cmap.br_startoff,

> +					cmap.br_blockcount);

>  

>  		end_fsb = imap.br_startoff + imap.br_blockcount;

>  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;

>
Darrick J. Wong Sept. 23, 2020, 10:59 p.m. UTC | #3
On Wed, Sep 23, 2020 at 05:37:39PM -0500, Eric Sandeen wrote:
> Here's a reproducer for the bug.

> 

> Requires sufficient privs to mount a loopback fs.


Please turn this into an fstest case, but otherwise this looks
reasonable.

--D

> 

> ======

> 

> #!/bin/bash

> 

> mkdir -p mnt

> umount mnt &>/dev/null

> 

> # Create 8g fs image

> mkfs.xfs -f -dfile,name=fsfile.img,size=8g &>/dev/null

> mount -o loop fsfile.img mnt

> 

> # Make all files w/ 1m hints; create original 2m file

> xfs_io -c "extsize 1048576" mnt

> xfs_io -c "cowextsize 1048576" mnt

> 

> echo "Create file mnt/b"

> xfs_io -f -c "pwrite -S 0x0 0 2m" -c fsync mnt/b &>/dev/null

> 

> # Make a reflinked copy

> echo "Reflink copy from mnt/b to mnt/a"

> cp --reflink=always mnt/b mnt/a

> 

> echo "Contents of mnt/b"

> hexdump -C mnt/b

> 

> # Cycle mount to get stuff out of cache

> umount mnt

> mount -o loop fsfile.img mnt

> 

> # Create a 1m-hinted IO at offset 0, then

> # do another IO that overlaps but extends past the 1m hint

> echo "Write to mnt/a"

> xfs_io -c "pwrite -S 0xa 0k -b 4k 4k" \

>        -c "pwrite -S 0xa 4k -b 1m 1m" \

>        mnt/a &>/dev/null

> 

> xfs_io -c fsync mnt/a

> 

> echo "Contents of mnt/b now:"

> hexdump -C mnt/b

> 

> umount mnt

>
Christoph Hellwig Sept. 24, 2020, 4:03 a.m. UTC | #4
On Wed, Sep 23, 2020 at 05:35:44PM -0500, Eric Sandeen wrote:
> A bug existed in the XFS reflink code between v5.1 and v5.5 in which

> the mapping for a COW IO was not trimmed to the mapping of the COW

> extent that was found.  This resulted in a too-short copy, and

> corruption of other files which shared the original extent.

> 

> (This happened only when extent size hints were set, which bypasses

> delalloc and led to this code path.)

> 

> This was (inadvertently) fixed upstream with

> 

> 36adcbace24e "xfs: fill out the srcmap in iomap_begin"

> 

> and related patches which moved lots of this functionality to

> the iomap subsystem.

> 

> Hence, this is a -stable only patch, targeted to fix this

> corruption vector without other major code changes.

> 

> Fixes: 78f0cc9d55cb ("xfs: don't use delalloc extents for COW on files with extsize hints")

> Cc: <stable@vger.kernel.org> # 5.4.x

> Signed-off-by: Eric Sandeen <sandeen@redhat.com>


Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


and as Darrick said we'll want to wire up the reproducer for xfstests.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 06b9e0aacf54..3289d0f4bb03 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1002,9 +1002,15 @@  xfs_file_iomap_begin(
 		 * I/O, which must be block aligned, we need to report the
 		 * newly allocated address.  If the data fork has a hole, copy
 		 * the COW fork mapping to avoid allocating to the data fork.
+		 *
+		 * Otherwise, ensure that the imap range does not extend past
+		 * the range allocated/found in cmap.
 		 */
 		if (directio || imap.br_startblock == HOLESTARTBLOCK)
 			imap = cmap;
+		else
+			xfs_trim_extent(&imap, cmap.br_startoff,
+					cmap.br_blockcount);
 
 		end_fsb = imap.br_startoff + imap.br_blockcount;
 		length = XFS_FSB_TO_B(mp, end_fsb) - offset;