diff mbox series

[RFC,v2] ceph: ceph: fix out-of-bound array access when doing a file read

Message ID 20240905135700.16394-1-luis.henriques@linux.dev
State New
Headers show
Series [RFC,v2] ceph: ceph: fix out-of-bound array access when doing a file read | expand

Commit Message

Luis Henriques Sept. 5, 2024, 1:57 p.m. UTC
__ceph_sync_read() does not correctly handle reads when the inode size is
zero.  It is easy to hit a NULL pointer dereference by continuously reading
a file while, on another client, we keep truncating and writing new data
into it.

The NULL pointer dereference happens when the inode size is zero but the
read op returns some data (ceph_osdc_wait_request()).  This will lead to
'left' being set to a huge value due to the overflow in:

	left = i_size - off;

and, in the loop that follows, the pages[] array being accessed beyond
num_pages.

This patch fixes the issue simply by checking the inode size and returning
if it is zero, even if there was data from the read op.

Link: https://tracker.ceph.com/issues/67524
Fixes: 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads")
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
---
 fs/ceph/file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Xiubo Li Sept. 6, 2024, 11:17 a.m. UTC | #1
On 9/5/24 21:57, Luis Henriques (SUSE) wrote:
> __ceph_sync_read() does not correctly handle reads when the inode size is
> zero.  It is easy to hit a NULL pointer dereference by continuously reading
> a file while, on another client, we keep truncating and writing new data
> into it.
>
> The NULL pointer dereference happens when the inode size is zero but the
> read op returns some data (ceph_osdc_wait_request()).  This will lead to
> 'left' being set to a huge value due to the overflow in:
>
> 	left = i_size - off;
>
> and, in the loop that follows, the pages[] array being accessed beyond
> num_pages.
>
> This patch fixes the issue simply by checking the inode size and returning
> if it is zero, even if there was data from the read op.
>
> Link: https://tracker.ceph.com/issues/67524
> Fixes: 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads")
> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
> ---
>   fs/ceph/file.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 4b8d59ebda00..41d4eac128bb 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   	if (ceph_inode_is_shutdown(inode))
>   		return -EIO;
>   
> -	if (!len)
> +	if (!len || !i_size)
>   		return 0;
>   	/*
>   	 * flush any page cache pages in this range.  this
> @@ -1154,6 +1154,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   		doutc(cl, "%llu~%llu got %zd i_size %llu%s\n", off, len,
>   		      ret, i_size, (more ? " MORE" : ""));
>   
> +		if (i_size == 0)
> +			ret = 0;
> +
>   		/* Fix it to go to end of extent map */
>   		if (sparse && ret >= 0)
>   			ret = ceph_sparse_ext_map_end(op);
>
Hi Luis,

BTW, so in the following code:

1202                 idx = 0;
1203                 if (ret <= 0)
1204                         left = 0;
1205                 else if (off + ret > i_size)
1206                         left = i_size - off;
1207                 else
1208                         left = ret;

The 'ret' should be larger than '0', right ?

If so we do not check anf fix it in the 'else if' branch instead?

Because currently the read path code won't exit directly and keep 
retrying to read if it found that the real content length is longer than 
the local 'i_size'.

Again I am afraid your current fix will break the MIX filelock semantic ?

Thanks

- Xiubo
Luis Henriques Sept. 6, 2024, 11:30 a.m. UTC | #2
On Fri, Sep 06 2024, Xiubo Li wrote:

> On 9/5/24 21:57, Luis Henriques (SUSE) wrote:
>> __ceph_sync_read() does not correctly handle reads when the inode size is
>> zero.  It is easy to hit a NULL pointer dereference by continuously reading
>> a file while, on another client, we keep truncating and writing new data
>> into it.
>>
>> The NULL pointer dereference happens when the inode size is zero but the
>> read op returns some data (ceph_osdc_wait_request()).  This will lead to
>> 'left' being set to a huge value due to the overflow in:
>>
>> 	left = i_size - off;
>>
>> and, in the loop that follows, the pages[] array being accessed beyond
>> num_pages.
>>
>> This patch fixes the issue simply by checking the inode size and returning
>> if it is zero, even if there was data from the read op.
>>
>> Link: https://tracker.ceph.com/issues/67524
>> Fixes: 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads")
>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
>> ---
>>   fs/ceph/file.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 4b8d59ebda00..41d4eac128bb 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>   	if (ceph_inode_is_shutdown(inode))
>>   		return -EIO;
>>   -	if (!len)
>> +	if (!len || !i_size)
>>   		return 0;
>>   	/*
>>   	 * flush any page cache pages in this range.  this
>> @@ -1154,6 +1154,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>   		doutc(cl, "%llu~%llu got %zd i_size %llu%s\n", off, len,
>>   		      ret, i_size, (more ? " MORE" : ""));
>>   +		if (i_size == 0)
>> +			ret = 0;
>> +
>>   		/* Fix it to go to end of extent map */
>>   		if (sparse && ret >= 0)
>>   			ret = ceph_sparse_ext_map_end(op);
>>
> Hi Luis,
>
> BTW, so in the following code:
>
> 1202                 idx = 0;
> 1203                 if (ret <= 0)
> 1204                         left = 0;
> 1205                 else if (off + ret > i_size)
> 1206                         left = i_size - off;
> 1207                 else
> 1208                         left = ret;
>
> The 'ret' should be larger than '0', right ?

Right.  (Which means we read something from the file.)

> If so we do not check anf fix it in the 'else if' branch instead?

Yes, and then we'll have:

	left = i_size - off;

and because 'i_size' is 0, so 'left' will be set to 0xffffffffff...
And the loop that follows:

	while (left > 0) {
        	...
        }

will keep looping until we get a NULL pointer.  Have you tried the
reproducer?

Cheers,
Xiubo Li Sept. 6, 2024, 12:48 p.m. UTC | #3
On 9/6/24 19:30, Luis Henriques wrote:
> On Fri, Sep 06 2024, Xiubo Li wrote:
>
>> On 9/5/24 21:57, Luis Henriques (SUSE) wrote:
>>> __ceph_sync_read() does not correctly handle reads when the inode size is
>>> zero.  It is easy to hit a NULL pointer dereference by continuously reading
>>> a file while, on another client, we keep truncating and writing new data
>>> into it.
>>>
>>> The NULL pointer dereference happens when the inode size is zero but the
>>> read op returns some data (ceph_osdc_wait_request()).  This will lead to
>>> 'left' being set to a huge value due to the overflow in:
>>>
>>> 	left = i_size - off;
>>>
>>> and, in the loop that follows, the pages[] array being accessed beyond
>>> num_pages.
>>>
>>> This patch fixes the issue simply by checking the inode size and returning
>>> if it is zero, even if there was data from the read op.
>>>
>>> Link: https://tracker.ceph.com/issues/67524
>>> Fixes: 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads")
>>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
>>> ---
>>>    fs/ceph/file.c | 5 ++++-
>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>> index 4b8d59ebda00..41d4eac128bb 100644
>>> --- a/fs/ceph/file.c
>>> +++ b/fs/ceph/file.c
>>> @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>>    	if (ceph_inode_is_shutdown(inode))
>>>    		return -EIO;
>>>    -	if (!len)
>>> +	if (!len || !i_size)
>>>    		return 0;
>>>    	/*
>>>    	 * flush any page cache pages in this range.  this
>>> @@ -1154,6 +1154,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>>    		doutc(cl, "%llu~%llu got %zd i_size %llu%s\n", off, len,
>>>    		      ret, i_size, (more ? " MORE" : ""));
>>>    +		if (i_size == 0)
>>> +			ret = 0;
>>> +
>>>    		/* Fix it to go to end of extent map */
>>>    		if (sparse && ret >= 0)
>>>    			ret = ceph_sparse_ext_map_end(op);
>>>
>> Hi Luis,
>>
>> BTW, so in the following code:
>>
>> 1202                 idx = 0;
>> 1203                 if (ret <= 0)
>> 1204                         left = 0;
>> 1205                 else if (off + ret > i_size)
>> 1206                         left = i_size - off;
>> 1207                 else
>> 1208                         left = ret;
>>
>> The 'ret' should be larger than '0', right ?
> Right.  (Which means we read something from the file.)
>
>> If so we do not check anf fix it in the 'else if' branch instead?
> Yes, and then we'll have:
>
> 	left = i_size - off;
>
> and because 'i_size' is 0, so 'left' will be set to 0xffffffffff...
> And the loop that follows:
>
> 	while (left > 0) {
>          	...
>          }
>
> will keep looping until we get a NULL pointer.  Have you tried the
> reproducer?

Hi Luis,

Not yet, and recently I haven't get a chance to do that for the reason 
as you know.

Thanks

- Xiubo


> Cheers,
diff mbox series

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 4b8d59ebda00..41d4eac128bb 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1066,7 +1066,7 @@  ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 	if (ceph_inode_is_shutdown(inode))
 		return -EIO;
 
-	if (!len)
+	if (!len || !i_size)
 		return 0;
 	/*
 	 * flush any page cache pages in this range.  this
@@ -1154,6 +1154,9 @@  ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 		doutc(cl, "%llu~%llu got %zd i_size %llu%s\n", off, len,
 		      ret, i_size, (more ? " MORE" : ""));
 
+		if (i_size == 0)
+			ret = 0;
+
 		/* Fix it to go to end of extent map */
 		if (sparse && ret >= 0)
 			ret = ceph_sparse_ext_map_end(op);