diff mbox series

[v2,01/12] fs: fix lazytime expiration handling in __writeback_single_inode()

Message ID 20210109075903.208222-2-ebiggers@kernel.org
State Superseded
Headers show
Series [v2,01/12] fs: fix lazytime expiration handling in __writeback_single_inode() | expand

Commit Message

Eric Biggers Jan. 9, 2021, 7:58 a.m. UTC
From: Eric Biggers <ebiggers@google.com>

When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.

This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).

However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state.  This causes two bugs:

- mark_inode_dirty_sync() redirties the inode, causing it to remain
  dirty.  This wastefully causes the inode to be written twice.  But
  more importantly, it breaks cases where sync_filesystem() is expected
  to clean dirty inodes.  This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
  ioctl (as reported at
  https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
  as possibly filesystem freezing (freeze_super()).

- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
  called from __writeback_single_inode() for lazytime expiration,
  xfs_fs_dirty_inode() ignores the notification.  (XFS only cares about
  lazytime expirations, and it assumes that I_DIRTY_TIME will contain
  i_state during those.)  Therefore, lazy timestamps aren't persisted by
  sync(), syncfs(), or dirtytime_expire_interval on XFS.

Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state.  This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.

This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled.  It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).

Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly.  But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.

Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable@vger.kernel.org
Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/fs-writeback.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

Comments

Christoph Hellwig Jan. 11, 2021, 10:48 a.m. UTC | #1
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Jan Kara Jan. 11, 2021, 2:46 p.m. UTC | #2
On Fri 08-01-21 23:58:52, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>

> 

> When lazytime is enabled and an inode is being written due to its

> in-memory updated timestamps having expired, either due to a sync() or

> syncfs() system call or due to dirtytime_expire_interval having elapsed,

> the VFS needs to inform the filesystem so that the filesystem can copy

> the inode's timestamps out to the on-disk data structures.

> 

> This is done by __writeback_single_inode() calling

> mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).

> 

> However, this occurs after __writeback_single_inode() has already

> cleared the dirty flags from ->i_state.  This causes two bugs:

> 

> - mark_inode_dirty_sync() redirties the inode, causing it to remain

>   dirty.  This wastefully causes the inode to be written twice.  But

>   more importantly, it breaks cases where sync_filesystem() is expected

>   to clean dirty inodes.  This includes the FS_IOC_REMOVE_ENCRYPTION_KEY

>   ioctl (as reported at

>   https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well

>   as possibly filesystem freezing (freeze_super()).

> 

> - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is

>   called from __writeback_single_inode() for lazytime expiration,

>   xfs_fs_dirty_inode() ignores the notification.  (XFS only cares about

>   lazytime expirations, and it assumes that I_DIRTY_TIME will contain

>   i_state during those.)  Therefore, lazy timestamps aren't persisted by

>   sync(), syncfs(), or dirtytime_expire_interval on XFS.

> 

> Fix this by moving the call to mark_inode_dirty_sync() to earlier in

> __writeback_single_inode(), before the dirty flags are cleared from

> i_state.  This makes filesystems be properly notified of the timestamp

> expiration, and it avoids incorrectly redirtying the inode.

> 

> This fixes xfstest generic/580 (which tests

> FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime

> enabled.  It also fixes the new lazytime xfstest I've proposed, which

> reproduces the above-mentioned XFS bug

> (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).

> 

> Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly.  But

> due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the

> right thing to do because mark_inode_dirty_sync() now knows not to move

> the inode to a writeback list if it is currently queued for sync.

> 

> Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")

> Cc: stable@vger.kernel.org

> Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")

> Suggested-by: Jan Kara <jack@suse.cz>

> Signed-off-by: Eric Biggers <ebiggers@google.com>


Thanks for writing this fix! It looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>


								Honza

> ---

>  fs/fs-writeback.c | 24 +++++++++++++-----------

>  1 file changed, 13 insertions(+), 11 deletions(-)

> 

> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c

> index acfb55834af23..c41cb887eb7d3 100644

> --- a/fs/fs-writeback.c

> +++ b/fs/fs-writeback.c

> @@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)

>  	}

>  

>  	/*

> -	 * Some filesystems may redirty the inode during the writeback

> -	 * due to delalloc, clear dirty metadata flags right before

> -	 * write_inode()

> +	 * If the inode has dirty timestamps and we need to write them, call

> +	 * mark_inode_dirty_sync() to notify the filesystem about it and to

> +	 * change I_DIRTY_TIME into I_DIRTY_SYNC.

>  	 */

> -	spin_lock(&inode->i_lock);

> -

> -	dirty = inode->i_state & I_DIRTY;

>  	if ((inode->i_state & I_DIRTY_TIME) &&

> -	    ((dirty & I_DIRTY_INODE) ||

> -	     wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||

> +	    (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||

>  	     time_after(jiffies, inode->dirtied_time_when +

>  			dirtytime_expire_interval * HZ))) {

> -		dirty |= I_DIRTY_TIME;

>  		trace_writeback_lazytime(inode);

> +		mark_inode_dirty_sync(inode);

>  	}

> +

> +	/*

> +	 * Some filesystems may redirty the inode during the writeback

> +	 * due to delalloc, clear dirty metadata flags right before

> +	 * write_inode()

> +	 */

> +	spin_lock(&inode->i_lock);

> +	dirty = inode->i_state & I_DIRTY;

>  	inode->i_state &= ~dirty;

>  

>  	/*

> @@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)

>  

>  	spin_unlock(&inode->i_lock);

>  

> -	if (dirty & I_DIRTY_TIME)

> -		mark_inode_dirty_sync(inode);

>  	/* Don't write the inode if only I_DIRTY_PAGES was set */

>  	if (dirty & ~I_DIRTY_PAGES) {

>  		int err = write_inode(inode, wbc);

> -- 

> 2.30.0

> 

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
diff mbox series

Patch

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af23..c41cb887eb7d3 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1474,21 +1474,25 @@  __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 	}
 
 	/*
-	 * Some filesystems may redirty the inode during the writeback
-	 * due to delalloc, clear dirty metadata flags right before
-	 * write_inode()
+	 * If the inode has dirty timestamps and we need to write them, call
+	 * mark_inode_dirty_sync() to notify the filesystem about it and to
+	 * change I_DIRTY_TIME into I_DIRTY_SYNC.
 	 */
-	spin_lock(&inode->i_lock);
-
-	dirty = inode->i_state & I_DIRTY;
 	if ((inode->i_state & I_DIRTY_TIME) &&
-	    ((dirty & I_DIRTY_INODE) ||
-	     wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+	    (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
 	     time_after(jiffies, inode->dirtied_time_when +
 			dirtytime_expire_interval * HZ))) {
-		dirty |= I_DIRTY_TIME;
 		trace_writeback_lazytime(inode);
+		mark_inode_dirty_sync(inode);
 	}
+
+	/*
+	 * Some filesystems may redirty the inode during the writeback
+	 * due to delalloc, clear dirty metadata flags right before
+	 * write_inode()
+	 */
+	spin_lock(&inode->i_lock);
+	dirty = inode->i_state & I_DIRTY;
 	inode->i_state &= ~dirty;
 
 	/*
@@ -1509,8 +1513,6 @@  __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	spin_unlock(&inode->i_lock);
 
-	if (dirty & I_DIRTY_TIME)
-		mark_inode_dirty_sync(inode);
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & ~I_DIRTY_PAGES) {
 		int err = write_inode(inode, wbc);