diff mbox series

9p: Fix DIO read through netfs

Message ID 1229195.1723211769@warthog.procyon.org.uk
State New
Headers show
Series 9p: Fix DIO read through netfs | expand

Commit Message

David Howells Aug. 9, 2024, 1:56 p.m. UTC
From: Dominique Martinet <asmadeus@codewreck.org>

9p: Fix DIO read through netfs

If a program is watching a file on a 9p mount, it won't see any change in
size if the file being exported by the server is changed directly in the
source filesystem, presumably because 9p doesn't have change notifications,
and because netfs skips the reads if the file is empty.

Fix this by attempting to read the full size specified when a DIO read is
requested (such as when 9p is operating in unbuffered mode) and dealing
with a short read if the EOF was less than the expected read.

To make this work, filesystems using netfslib must not set
NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
I don't want to mandatorily clear this flag in netfslib for DIO because,
say, ceph might make a read from an object that is not completely filled,
but does not reside at the end of file - and so we need to clear the
excess.

This can be tested by watching an empty file over 9p within a VM (such as
in the ktest framework):

        while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo

then writing something into the empty file.  The watcher should immediately
display the file content and break out of the loop.  Without this fix, it
remains in the loop indefinitely.

Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
Written-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/9p/vfs_addr.c     |    3 ++-
 fs/afs/file.c        |    3 ++-
 fs/ceph/addr.c       |    6 ++++--
 fs/netfs/io.c        |   17 +++++++++++------
 fs/nfs/fscache.c     |    3 ++-
 fs/smb/client/file.c |    3 ++-
 6 files changed, 23 insertions(+), 12 deletions(-)

Comments

Dominique Martinet Aug. 10, 2024, 3:36 a.m. UTC | #1
David Howells wrote on Fri, Aug 09, 2024 at 02:56:09PM +0100:
> From: Dominique Martinet <asmadeus@codewreck.org>
> 
> 9p: Fix DIO read through netfs

nitpick: now sure how that ended up here but this is duplicated with the
subject (the commit message ends up with this line twice)

> If a program is watching a file on a 9p mount, it won't see any change in
> size if the file being exported by the server is changed directly in the
> source filesystem, presumably because 9p doesn't have change notifications,
> and because netfs skips the reads if the file is empty.
> 
> Fix this by attempting to read the full size specified when a DIO read is
> requested (such as when 9p is operating in unbuffered mode) and dealing
> with a short read if the EOF was less than the expected read.
> 
> To make this work, filesystems using netfslib must not set
> NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
> I don't want to mandatorily clear this flag in netfslib for DIO because,
> say, ceph might make a read from an object that is not completely filled,
> but does not reside at the end of file - and so we need to clear the
> excess.
> 
> This can be tested by watching an empty file over 9p within a VM (such as
> in the ktest framework):
> 
>         while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo

(This is basically the same thing but if one wants to control the read
timing for more precise/verbose debugging:
  exec 3< /host/tmp/foo
  read -u 3 content && echo $content
  (repeat as appropriate)
  exec 3>&-
)

> then writing something into the empty file.  The watcher should immediately
> display the file content and break out of the loop.  Without this fix, it
> remains in the loop indefinitely.
> 
> Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
> Written-by: Dominique Martinet <asmadeus@codewreck.org>

Thanks for adding extra comments & fixing other filesystems.

I've checked this covers all cases of setting NETFS_SREQ_CLEAR_TAIL so
hopefully shouldn't have further side effects, this sounds good to me:

Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Eric Van Hensbergen <ericvh@kernel.org>
> cc: Latchesar Ionkov <lucho@ionkov.net>
> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Ilya Dryomov <idryomov@gmail.com>
> cc: Steve French <sfrench@samba.org>
> cc: Paulo Alcantara <pc@manguebit.com>
> cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> cc: v9fs@lists.linux.dev
> cc: linux-afs@lists.infradead.org
> cc: ceph-devel@vger.kernel.org
> cc: linux-cifs@vger.kernel.org
> cc: linux-nfs@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/9p/vfs_addr.c     |    3 ++-
>  fs/afs/file.c        |    3 ++-
>  fs/ceph/addr.c       |    6 ++++--
>  fs/netfs/io.c        |   17 +++++++++++------
>  fs/nfs/fscache.c     |    3 ++-
>  fs/smb/client/file.c |    3 ++-
>  6 files changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index a97ceb105cd8..24fdc74caeba 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -75,7 +75,8 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
>  
>  	/* if we just extended the file size, any portion not in
>  	 * cache won't be on server and is zeroes */
> -	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> +	if (subreq->rreq->origin != NETFS_DIO_READ)
> +		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>  
>  	netfs_subreq_terminated(subreq, err ?: total, false);
>  }
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index c3f0c45ae9a9..ec1be0091fdb 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -242,7 +242,8 @@ static void afs_fetch_data_notify(struct afs_operation *op)
>  
>  	req->error = error;
>  	if (subreq) {
> -		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> +		if (subreq->rreq->origin != NETFS_DIO_READ)
> +			__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>  		netfs_subreq_terminated(subreq, error ?: req->actual_len, false);
>  		req->subreq = NULL;
>  	} else if (req->done) {
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index cc0a2240de98..c4744a02db75 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -246,7 +246,8 @@ static void finish_netfs_read(struct ceph_osd_request *req)
>  	if (err >= 0) {
>  		if (sparse && err > 0)
>  			err = ceph_sparse_ext_map_end(op);
> -		if (err < subreq->len)
> +		if (err < subreq->len &&
> +		    subreq->rreq->origin != NETFS_DIO_READ)
>  			__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>  		if (IS_ENCRYPTED(inode) && err > 0) {
>  			err = ceph_fscrypt_decrypt_extents(inode,
> @@ -282,7 +283,8 @@ static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq)
>  	size_t len;
>  	int mode;
>  
> -	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> +	if (rreq->origin != NETFS_DIO_READ)
> +		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>  	__clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
>  
>  	if (subreq->start >= inode->i_size)
> diff --git a/fs/netfs/io.c b/fs/netfs/io.c
> index c179a1c73fa7..5367caf3fa28 100644
> --- a/fs/netfs/io.c
> +++ b/fs/netfs/io.c
> @@ -530,7 +530,8 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
>  
>  	if (transferred_or_error == 0) {
>  		if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
> -			subreq->error = -ENODATA;
> +			if (rreq->origin != NETFS_DIO_READ)
> +				subreq->error = -ENODATA;
>  			goto failed;
>  		}
>  	} else {
> @@ -601,9 +602,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
>  			}
>  			if (subreq->len > ictx->zero_point - subreq->start)
>  				subreq->len = ictx->zero_point - subreq->start;
> +
> +			/* We limit buffered reads to the EOF, but let the
> +			 * server deal with larger-than-EOF DIO/unbuffered
> +			 * reads.
> +			 */
> +			if (subreq->len > rreq->i_size - subreq->start)
> +				subreq->len = rreq->i_size - subreq->start;
>  		}
> -		if (subreq->len > rreq->i_size - subreq->start)
> -			subreq->len = rreq->i_size - subreq->start;
>  		if (rreq->rsize && subreq->len > rreq->rsize)
>  			subreq->len = rreq->rsize;
>  
> @@ -739,11 +745,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
>  	do {
>  		_debug("submit %llx + %llx >= %llx",
>  		       rreq->start, rreq->submitted, rreq->i_size);
> -		if (rreq->origin == NETFS_DIO_READ &&
> -		    rreq->start + rreq->submitted >= rreq->i_size)
> -			break;
>  		if (!netfs_rreq_submit_slice(rreq, &io_iter))
>  			break;
> +		if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags))
> +			break;
>  		if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
>  		    test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
>  			break;
> diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> index bf29a65c5027..7a558dea75c4 100644
> --- a/fs/nfs/fscache.c
> +++ b/fs/nfs/fscache.c
> @@ -363,7 +363,8 @@ void nfs_netfs_read_completion(struct nfs_pgio_header *hdr)
>  		return;
>  
>  	sreq = netfs->sreq;
> -	if (test_bit(NFS_IOHDR_EOF, &hdr->flags))
> +	if (test_bit(NFS_IOHDR_EOF, &hdr->flags) &&
> +	    sreq->rreq->origin != NETFS_DIO_READ)
>  		__set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
>  
>  	if (hdr->error)
> diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
> index b2405dd4d4d4..3f3842e7b44a 100644
> --- a/fs/smb/client/file.c
> +++ b/fs/smb/client/file.c
> @@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq)
>  			goto out;
>  	}
>  
> -	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> +	if (subreq->rreq->origin != NETFS_DIO_READ)
> +		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>  
>  	rc = rdata->server->ops->async_readv(rdata);
>  out:
>
diff mbox series

Patch

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index a97ceb105cd8..24fdc74caeba 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -75,7 +75,8 @@  static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
 
 	/* if we just extended the file size, any portion not in
 	 * cache won't be on server and is zeroes */
-	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+	if (subreq->rreq->origin != NETFS_DIO_READ)
+		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 
 	netfs_subreq_terminated(subreq, err ?: total, false);
 }
diff --git a/fs/afs/file.c b/fs/afs/file.c
index c3f0c45ae9a9..ec1be0091fdb 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -242,7 +242,8 @@  static void afs_fetch_data_notify(struct afs_operation *op)
 
 	req->error = error;
 	if (subreq) {
-		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+		if (subreq->rreq->origin != NETFS_DIO_READ)
+			__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 		netfs_subreq_terminated(subreq, error ?: req->actual_len, false);
 		req->subreq = NULL;
 	} else if (req->done) {
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index cc0a2240de98..c4744a02db75 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -246,7 +246,8 @@  static void finish_netfs_read(struct ceph_osd_request *req)
 	if (err >= 0) {
 		if (sparse && err > 0)
 			err = ceph_sparse_ext_map_end(op);
-		if (err < subreq->len)
+		if (err < subreq->len &&
+		    subreq->rreq->origin != NETFS_DIO_READ)
 			__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 		if (IS_ENCRYPTED(inode) && err > 0) {
 			err = ceph_fscrypt_decrypt_extents(inode,
@@ -282,7 +283,8 @@  static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq)
 	size_t len;
 	int mode;
 
-	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+	if (rreq->origin != NETFS_DIO_READ)
+		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 	__clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
 
 	if (subreq->start >= inode->i_size)
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index c179a1c73fa7..5367caf3fa28 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -530,7 +530,8 @@  void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
 
 	if (transferred_or_error == 0) {
 		if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
-			subreq->error = -ENODATA;
+			if (rreq->origin != NETFS_DIO_READ)
+				subreq->error = -ENODATA;
 			goto failed;
 		}
 	} else {
@@ -601,9 +602,14 @@  netfs_rreq_prepare_read(struct netfs_io_request *rreq,
 			}
 			if (subreq->len > ictx->zero_point - subreq->start)
 				subreq->len = ictx->zero_point - subreq->start;
+
+			/* We limit buffered reads to the EOF, but let the
+			 * server deal with larger-than-EOF DIO/unbuffered
+			 * reads.
+			 */
+			if (subreq->len > rreq->i_size - subreq->start)
+				subreq->len = rreq->i_size - subreq->start;
 		}
-		if (subreq->len > rreq->i_size - subreq->start)
-			subreq->len = rreq->i_size - subreq->start;
 		if (rreq->rsize && subreq->len > rreq->rsize)
 			subreq->len = rreq->rsize;
 
@@ -739,11 +745,10 @@  int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
 	do {
 		_debug("submit %llx + %llx >= %llx",
 		       rreq->start, rreq->submitted, rreq->i_size);
-		if (rreq->origin == NETFS_DIO_READ &&
-		    rreq->start + rreq->submitted >= rreq->i_size)
-			break;
 		if (!netfs_rreq_submit_slice(rreq, &io_iter))
 			break;
+		if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags))
+			break;
 		if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
 		    test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
 			break;
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index bf29a65c5027..7a558dea75c4 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -363,7 +363,8 @@  void nfs_netfs_read_completion(struct nfs_pgio_header *hdr)
 		return;
 
 	sreq = netfs->sreq;
-	if (test_bit(NFS_IOHDR_EOF, &hdr->flags))
+	if (test_bit(NFS_IOHDR_EOF, &hdr->flags) &&
+	    sreq->rreq->origin != NETFS_DIO_READ)
 		__set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
 
 	if (hdr->error)
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b2405dd4d4d4..3f3842e7b44a 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -217,7 +217,8 @@  static void cifs_req_issue_read(struct netfs_io_subrequest *subreq)
 			goto out;
 	}
 
-	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+	if (subreq->rreq->origin != NETFS_DIO_READ)
+		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 
 	rc = rdata->server->ops->async_readv(rdata);
 out: