Message ID | 20210206203648.609650-1-arjunroy.kdev@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | [net-next,v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args. | expand |
On 2/6/21 1:36 PM, Arjun Roy wrote: > From: Arjun Roy <arjunroy@google.com> > > Explicitly define reserved field and require it to be 0-valued. > > Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.") > Signed-off-by: Arjun Roy <arjunroy@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> > Suggested-by: David Ahern <dsahern@gmail.com> > Suggested-by: Leon Romanovsky <leon@kernel.org> > Suggested-by: Jakub Kicinski <kuba@kernel.org> > --- > include/uapi/linux/tcp.h | 2 +- > net/ipv4/tcp.c | 2 ++ > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h > index 42fc5a640df4..8fc09e8638b3 100644 > --- a/include/uapi/linux/tcp.h > +++ b/include/uapi/linux/tcp.h > @@ -357,6 +357,6 @@ struct tcp_zerocopy_receive { > __u64 msg_control; /* ancillary data */ > __u64 msg_controllen; > __u32 msg_flags; > - /* __u32 hole; Next we must add >1 u32 otherwise length checks fail. */ > + __u32 reserved; /* set to 0 for now */ > }; > #endif /* _UAPI_LINUX_TCP_H */ > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index e1a17c6b473c..c8469c579ed8 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level, > } > if (copy_from_user(&zc, optval, len)) > return -EFAULT; > + if (zc.reserved) > + return -EINVAL; > lock_sock(sk); > err = tcp_zerocopy_receive(sk, &zc, &tss); > release_sock(sk); > The 'switch (len)' statement needs to be updated now that 'len' is not going to end on the 'msg_flags' boundary? But then, how did that work before if there was 4 byte padding? Maybe I am missing something here. You currently have: switch (len) { case offsetofend(struct tcp_zerocopy_receive, msg_flags): which should == 60, yet the struct size is 64 with 4-bytes of padding. A user doing int optlen = sizeof(struct tcp_zerocopy_receive); getsockopt(...., &optlen) would pass in a value of 64, right?
On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote: > On Sat, Feb 06, 2021 at 03:28:28PM -0800, Jakub Kicinski wrote: > > On Sat, 6 Feb 2021 12:36:48 -0800 Arjun Roy wrote: > > > From: Arjun Roy <arjunroy@google.com> > > > > > > Explicitly define reserved field and require it to be 0-valued. > > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > > index e1a17c6b473c..c8469c579ed8 100644 > > > --- a/net/ipv4/tcp.c > > > +++ b/net/ipv4/tcp.c > > > @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level, > > > } > > > if (copy_from_user(&zc, optval, len)) > > > return -EFAULT; > > > + if (zc.reserved) > > > + return -EINVAL; > > > lock_sock(sk); > > > err = tcp_zerocopy_receive(sk, &zc, &tss); > > > release_sock(sk); > > > > I was expecting we'd also throw in a check_zeroed_user(). > > Either we can check if the buffer is zeroed all the way, > > or we can't and we shouldn't validate reserved either > > > > check_zeroed_user(optval + offsetof(reserved), > > len - offsetof(reserved)) > > ? > > There is a check that len is not larger than zs and users can't give > large buffer. > > I would say that is pretty safe to write "if (zc.reserved)". Which check? There's a check which truncates (writes back to user space len = min(len, sizeof(zc)). Application can still pass garbage beyond sizeof(zc) and syscall may start failing in the future if sizeof(zc) changes.
On 2/8/21 11:41 AM, Jakub Kicinski wrote: > On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote: >> On Sat, Feb 06, 2021 at 03:28:28PM -0800, Jakub Kicinski wrote: >>> On Sat, 6 Feb 2021 12:36:48 -0800 Arjun Roy wrote: >>>> From: Arjun Roy <arjunroy@google.com> >>>> >>>> Explicitly define reserved field and require it to be 0-valued. >>> >>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >>>> index e1a17c6b473c..c8469c579ed8 100644 >>>> --- a/net/ipv4/tcp.c >>>> +++ b/net/ipv4/tcp.c >>>> @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level, >>>> } >>>> if (copy_from_user(&zc, optval, len)) >>>> return -EFAULT; >>>> + if (zc.reserved) >>>> + return -EINVAL; >>>> lock_sock(sk); >>>> err = tcp_zerocopy_receive(sk, &zc, &tss); >>>> release_sock(sk); >>> >>> I was expecting we'd also throw in a check_zeroed_user(). >>> Either we can check if the buffer is zeroed all the way, >>> or we can't and we shouldn't validate reserved either >>> >>> check_zeroed_user(optval + offsetof(reserved), >>> len - offsetof(reserved)) >>> ? >> >> There is a check that len is not larger than zs and users can't give >> large buffer. >> >> I would say that is pretty safe to write "if (zc.reserved)". > > Which check? There's a check which truncates (writes back to user space > len = min(len, sizeof(zc)). Application can still pass garbage beyond > sizeof(zc) and syscall may start failing in the future if sizeof(zc) > changes. > That would be the case for new userspace on old kernel. Extending the check to the end of the struct would guarantee new userspace can not ask for something that the running kernel does not understand.
On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote: > On 2/8/21 11:41 AM, Jakub Kicinski wrote: > > On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote: > >> There is a check that len is not larger than zs and users can't give > >> large buffer. > >> > >> I would say that is pretty safe to write "if (zc.reserved)". > > > > Which check? There's a check which truncates (writes back to user space > > len = min(len, sizeof(zc)). Application can still pass garbage beyond > > sizeof(zc) and syscall may start failing in the future if sizeof(zc) > > changes. > > That would be the case for new userspace on old kernel. Extending the > check to the end of the struct would guarantee new userspace can not ask > for something that the running kernel does not understand. Indeed, so we're agreeing that check_zeroed_user() is needed before original optlen from user space gets truncated?
On 2/8/21 7:53 PM, Jakub Kicinski wrote: > On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote: >> On 2/8/21 11:41 AM, Jakub Kicinski wrote: >>> On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote: >>>> There is a check that len is not larger than zs and users can't give >>>> large buffer. >>>> >>>> I would say that is pretty safe to write "if (zc.reserved)". >>> >>> Which check? There's a check which truncates (writes back to user space >>> len = min(len, sizeof(zc)). Application can still pass garbage beyond >>> sizeof(zc) and syscall may start failing in the future if sizeof(zc) >>> changes. >> >> That would be the case for new userspace on old kernel. Extending the >> check to the end of the struct would guarantee new userspace can not ask >> for something that the running kernel does not understand. > > Indeed, so we're agreeing that check_zeroed_user() is needed before > original optlen from user space gets truncated? > I thought so, but maybe not. To think through this ... If current kernel understands a struct of size N, it can only copy that amount from user to kernel. Anything beyond is ignored in these multiplexed uAPIs, and that is where the new userspace on old kernel falls. Known value checks can only be done up to size N. In this case, the reserved field is at the end of the known struct size, so checking just the field is fine. Going beyond the reserved field has implications for extensions to the API which should be handled when those extensions are added. So, in short I think the "if (zc.reserved)" is correct as Leon noted.
On Mon, Feb 08, 2021 at 10:41:43AM -0800, Jakub Kicinski wrote: > On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote: > > On Sat, Feb 06, 2021 at 03:28:28PM -0800, Jakub Kicinski wrote: > > > On Sat, 6 Feb 2021 12:36:48 -0800 Arjun Roy wrote: > > > > From: Arjun Roy <arjunroy@google.com> > > > > > > > > Explicitly define reserved field and require it to be 0-valued. > > > > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > > > index e1a17c6b473c..c8469c579ed8 100644 > > > > --- a/net/ipv4/tcp.c > > > > +++ b/net/ipv4/tcp.c > > > > @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level, > > > > } > > > > if (copy_from_user(&zc, optval, len)) > > > > return -EFAULT; > > > > + if (zc.reserved) > > > > + return -EINVAL; > > > > lock_sock(sk); > > > > err = tcp_zerocopy_receive(sk, &zc, &tss); > > > > release_sock(sk); > > > > > > I was expecting we'd also throw in a check_zeroed_user(). > > > Either we can check if the buffer is zeroed all the way, > > > or we can't and we shouldn't validate reserved either > > > > > > check_zeroed_user(optval + offsetof(reserved), > > > len - offsetof(reserved)) > > > ? > > > > There is a check that len is not larger than zs and users can't give > > large buffer. > > > > I would say that is pretty safe to write "if (zc.reserved)". > > Which check? There's a check which truncates (writes back to user space > len = min(len, sizeof(zc)). Application can still pass garbage beyond > sizeof(zc) and syscall may start failing in the future if sizeof(zc) > changes. At least in my tree, we have the length check: 4155 if (len > sizeof(zc)) { 4156 len = sizeof(zc); 4157 if (put_user(len, optlen)) 4158 return -EFAULT; 4159 } Ad David wrote below, the "if (zc.reserved)" is enough. We have following options: 1. Old kernel that have sizeof(sz) upto .reserved and old userspace -> len <= sizeof(sz) - works correctly. 2. Old kernel that have sizeof(sz) upto .reserved and new userspace that sends larger struct -> "f (len > sizeof(zc))" will return -EFAULT 3. New kernel that have sizeof(sz) beyond reserved and old userspace -> any new added field to struct sz should be checked and anyway it is the same as item 1. 4. New kernel and new userspace -> standard flow. Thanks
On Mon, Feb 08, 2021 at 08:20:29PM -0700, David Ahern wrote: > On 2/8/21 7:53 PM, Jakub Kicinski wrote: > > On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote: > >> On 2/8/21 11:41 AM, Jakub Kicinski wrote: > >>> On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote: > >>>> There is a check that len is not larger than zs and users can't give > >>>> large buffer. > >>>> > >>>> I would say that is pretty safe to write "if (zc.reserved)". > >>> > >>> Which check? There's a check which truncates (writes back to user space > >>> len = min(len, sizeof(zc)). Application can still pass garbage beyond > >>> sizeof(zc) and syscall may start failing in the future if sizeof(zc) > >>> changes. > >> > >> That would be the case for new userspace on old kernel. Extending the > >> check to the end of the struct would guarantee new userspace can not ask > >> for something that the running kernel does not understand. > > > > Indeed, so we're agreeing that check_zeroed_user() is needed before > > original optlen from user space gets truncated? > > > > I thought so, but maybe not. To think through this ... > > If current kernel understands a struct of size N, it can only copy that > amount from user to kernel. Anything beyond is ignored in these > multiplexed uAPIs, and that is where the new userspace on old kernel falls. > > Known value checks can only be done up to size N. In this case, the > reserved field is at the end of the known struct size, so checking just > the field is fine. Going beyond the reserved field has implications for > extensions to the API which should be handled when those extensions are > added. It is handled. https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/ipv4/tcp.c#n4155 if (len > sizeof(zc)) { len = sizeof(zc); if (put_user(len, optlen)) return -EFAULT; } Thanks > > So, in short I think the "if (zc.reserved)" is correct as Leon noted.
On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote: > On 2/8/21 7:53 PM, Jakub Kicinski wrote: > > On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote: > >> That would be the case for new userspace on old kernel. Extending the > >> check to the end of the struct would guarantee new userspace can not ask > >> for something that the running kernel does not understand. > > > > Indeed, so we're agreeing that check_zeroed_user() is needed before > > original optlen from user space gets truncated? > > I thought so, but maybe not. To think through this ... > > If current kernel understands a struct of size N, it can only copy that > amount from user to kernel. Anything beyond is ignored in these > multiplexed uAPIs, and that is where the new userspace on old kernel falls. > > Known value checks can only be done up to size N. In this case, the > reserved field is at the end of the known struct size, so checking just > the field is fine. Going beyond the reserved field has implications for > extensions to the API which should be handled when those extensions are > added. Let me try one last time. There is no check in the kernels that len <= N. User can pass any length _already_. check_zeroed_user() forces the values beyond the structure length to be known (0) rather than anything. It can only avoid breakages in the future. > So, in short I think the "if (zc.reserved)" is correct as Leon noted. If it's correct to check some arbitrary part of the buffer is zeroed it should be correct to check the entire tail is zeroed.
On Tue, 9 Feb 2021 08:15:11 +0200 Leon Romanovsky wrote: > At least in my tree, we have the length check: > 4155 if (len > sizeof(zc)) { > 4156 len = sizeof(zc); > 4157 if (put_user(len, optlen)) > 4158 return -EFAULT; > 4159 } > > > Ad David wrote below, the "if (zc.reserved)" is enough. > > We have following options: > 1. Old kernel that have sizeof(sz) upto .reserved and old userspace > -> len <= sizeof(sz) - works correctly. > 2. Old kernel that have sizeof(sz) upto .reserved and new userspace that > sends larger struct -> "f (len > sizeof(zc))" will return -EFAULT Based on the code you quoted? I don't see how. Maybe I need a vacation. put_user() just copies len back to user space after truncation. > 3. New kernel that have sizeof(sz) beyond reserved and old userspace > -> any new added field to struct sz should be checked and anyway it is the same as item 1. > 4. New kernel and new userspace > -> standard flow.
On Tue, Feb 09, 2021 at 08:59:38AM -0800, Jakub Kicinski wrote: > On Tue, 9 Feb 2021 08:15:11 +0200 Leon Romanovsky wrote: > > At least in my tree, we have the length check: > > 4155 if (len > sizeof(zc)) { > > 4156 len = sizeof(zc); > > 4157 if (put_user(len, optlen)) > > 4158 return -EFAULT; > > 4159 } > > > > > > Ad David wrote below, the "if (zc.reserved)" is enough. > > > > We have following options: > > 1. Old kernel that have sizeof(sz) upto .reserved and old userspace > > -> len <= sizeof(sz) - works correctly. > > 2. Old kernel that have sizeof(sz) upto .reserved and new userspace that > > sends larger struct -> "f (len > sizeof(zc))" will return -EFAULT > > Based on the code you quoted? I don't see how. Maybe I need a vacation. > put_user() just copies len back to user space after truncation. It is me who needs to go to vacation, you are right it doesn't return for lengths larger than sizeof(zc). > > > 3. New kernel that have sizeof(sz) beyond reserved and old userspace > > -> any new added field to struct sz should be checked and anyway it is the same as item 1. > > 4. New kernel and new userspace > > -> standard flow.
On Tue, Feb 9, 2021 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote: > > On 2/8/21 7:53 PM, Jakub Kicinski wrote: > > > On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote: > > >> That would be the case for new userspace on old kernel. Extending the > > >> check to the end of the struct would guarantee new userspace can not ask > > >> for something that the running kernel does not understand. > > > > > > Indeed, so we're agreeing that check_zeroed_user() is needed before > > > original optlen from user space gets truncated? > > > > I thought so, but maybe not. To think through this ... > > > > If current kernel understands a struct of size N, it can only copy that > > amount from user to kernel. Anything beyond is ignored in these > > multiplexed uAPIs, and that is where the new userspace on old kernel falls. > > > > Known value checks can only be done up to size N. In this case, the > > reserved field is at the end of the known struct size, so checking just > > the field is fine. Going beyond the reserved field has implications for > > extensions to the API which should be handled when those extensions are > > added. > > Let me try one last time. > > There is no check in the kernels that len <= N. User can pass any > length _already_. check_zeroed_user() forces the values beyond the > structure length to be known (0) rather than anything. It can only > avoid breakages in the future. > > > So, in short I think the "if (zc.reserved)" is correct as Leon noted. > > If it's correct to check some arbitrary part of the buffer is zeroed > it should be correct to check the entire tail is zeroed. So, coming back to the thread, I think the following appears to be the current thoughts: 1. It is requested that, on the kernel as it stands today, fields beyond zc.msg_flags (including zc.reserved, the only such field as of this patch) are zero'd out. So a new userspace asking to do specific things would fail on this old kernel with EINVAL. Old userspace would work on old or new kernels. New of course works on new kernels. 2. If it's correct to check some arbitrary field (zc.reserved) to be 0, then it should be fine to check this for all future fields >= reserved in the struct. So some advanced userspace down the line doesn't get confused. Strictly speaking, I'm not convinced this is necessary - eg. 64 bytes struct right now, suppose userspace of the future gives us 96 bytes of which the last 32 are non-zero for some feature or the other. We, in the here and now kernel, truncate that length to 64 (as in we only copy to kernel those first 64 bytes) and set the returned length to 64. The understanding being, any (future, past or present) userspace consults the output value; and considers anything byte >= the returned len to be untouched by the kernel executing the call (ie. garbage, unacted upon). So, how would this work for old+new userspace on old+new kernel? A) old+old, new+new: sizes match, no issue B) new kernel, old userspace: That's not an issue. We have the switch(len) statement for that. C) old kernel, new userspace: that's the 96 vs. 64 B example above - new userspace would see that the kernel only operated on 64 B and treat the last 32 B as garbage/unacted on. In this case, we would not give EINVAL on case C, as we would if we returned EINVAL on a check_zeroed_user() case for fields past zc.reserved. We'd do a zerocopy operating on just the features we know about, and communicate to the user that we only acted on features up until this byte offset. Now, given this is the case, we still have the padding confusion with zc.reserved and the current struct size, so we have to force it to 0 as we are doing. But I think we don't need to go beyond this so far. Thus, my personal preference is to not have the check_zeroed_user() check. But if the consensus demands it, then it's an easy enough fix. What are your thoughts? Thanks, -Arjun
On 2/9/21 4:46 PM, Arjun Roy wrote: > On Tue, Feb 9, 2021 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote: >> >> On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote: >>> On 2/8/21 7:53 PM, Jakub Kicinski wrote: >>>> On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote: >>>>> That would be the case for new userspace on old kernel. Extending the >>>>> check to the end of the struct would guarantee new userspace can not ask >>>>> for something that the running kernel does not understand. >>>> >>>> Indeed, so we're agreeing that check_zeroed_user() is needed before >>>> original optlen from user space gets truncated? >>> >>> I thought so, but maybe not. To think through this ... >>> >>> If current kernel understands a struct of size N, it can only copy that >>> amount from user to kernel. Anything beyond is ignored in these >>> multiplexed uAPIs, and that is where the new userspace on old kernel falls. >>> >>> Known value checks can only be done up to size N. In this case, the >>> reserved field is at the end of the known struct size, so checking just >>> the field is fine. Going beyond the reserved field has implications for >>> extensions to the API which should be handled when those extensions are >>> added. >> >> Let me try one last time. >> >> There is no check in the kernels that len <= N. User can pass any >> length _already_. check_zeroed_user() forces the values beyond the >> structure length to be known (0) rather than anything. It can only >> avoid breakages in the future. >> >>> So, in short I think the "if (zc.reserved)" is correct as Leon noted. >> >> If it's correct to check some arbitrary part of the buffer is zeroed >> it should be correct to check the entire tail is zeroed. > > So, coming back to the thread, I think the following appears to be the > current thoughts: > > 1. It is requested that, on the kernel as it stands today, fields > beyond zc.msg_flags (including zc.reserved, the only such field as of > this patch) are zero'd out. So a new userspace asking to do specific > things would fail on this old kernel with EINVAL. Old userspace would > work on old or new kernels. New of course works on new kernels. > 2. If it's correct to check some arbitrary field (zc.reserved) to be > 0, then it should be fine to check this for all future fields >= > reserved in the struct. So some advanced userspace down the line > doesn't get confused. > > Strictly speaking, I'm not convinced this is necessary - eg. 64 bytes > struct right now, suppose userspace of the future gives us 96 bytes of > which the last 32 are non-zero for some feature or the other. We, in > the here and now kernel, truncate that length to 64 (as in we only > copy to kernel those first 64 bytes) and set the returned length to > 64. The understanding being, any (future, past or present) userspace > consults the output value; and considers anything byte >= the returned > len to be untouched by the kernel executing the call (ie. garbage, > unacted upon). > > So, how would this work for old+new userspace on old+new kernel? > > A) old+old, new+new: sizes match, no issue > B) new kernel, old userspace: That's not an issue. We have the > switch(len) statement for that. > C) old kernel, new userspace: that's the 96 vs. 64 B example above - > new userspace would see that the kernel only operated on 64 B and > treat the last 32 B as garbage/unacted on. > > In this case, we would not give EINVAL on case C, as we would if we > returned EINVAL on a check_zeroed_user() case for fields past > zc.reserved. We'd do a zerocopy operating on just the features we know > about, and communicate to the user that we only acted on features up > until this byte offset. > > Now, given this is the case, we still have the padding confusion with > zc.reserved and the current struct size, so we have to force it to 0 > as we are doing. But I think we don't need to go beyond this so far. > > Thus, my personal preference is to not have the check_zeroed_user() > check. But if the consensus demands it, then it's an easy enough fix. > What are your thoughts? > bpf uses check_zeroed_user to make sure extensions to its structs are compatible, so yes, this is required. Also, you need to address legitimate msg_flags as I mentioned in another response.
On Tue, Feb 9, 2021 at 8:35 PM David Ahern <dsahern@gmail.com> wrote: > > On 2/9/21 4:46 PM, Arjun Roy wrote: > > On Tue, Feb 9, 2021 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote: > >> > >> On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote: > >>> On 2/8/21 7:53 PM, Jakub Kicinski wrote: > >>>> On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote: > >>>>> That would be the case for new userspace on old kernel. Extending the > >>>>> check to the end of the struct would guarantee new userspace can not ask > >>>>> for something that the running kernel does not understand. > >>>> > >>>> Indeed, so we're agreeing that check_zeroed_user() is needed before > >>>> original optlen from user space gets truncated? > >>> > >>> I thought so, but maybe not. To think through this ... > >>> > >>> If current kernel understands a struct of size N, it can only copy that > >>> amount from user to kernel. Anything beyond is ignored in these > >>> multiplexed uAPIs, and that is where the new userspace on old kernel falls. > >>> > >>> Known value checks can only be done up to size N. In this case, the > >>> reserved field is at the end of the known struct size, so checking just > >>> the field is fine. Going beyond the reserved field has implications for > >>> extensions to the API which should be handled when those extensions are > >>> added. > >> > >> Let me try one last time. > >> > >> There is no check in the kernels that len <= N. User can pass any > >> length _already_. check_zeroed_user() forces the values beyond the > >> structure length to be known (0) rather than anything. It can only > >> avoid breakages in the future. > >> > >>> So, in short I think the "if (zc.reserved)" is correct as Leon noted. > >> > >> If it's correct to check some arbitrary part of the buffer is zeroed > >> it should be correct to check the entire tail is zeroed. > > > > So, coming back to the thread, I think the following appears to be the > > current thoughts: > > > > 1. It is requested that, on the kernel as it stands today, fields > > beyond zc.msg_flags (including zc.reserved, the only such field as of > > this patch) are zero'd out. So a new userspace asking to do specific > > things would fail on this old kernel with EINVAL. Old userspace would > > work on old or new kernels. New of course works on new kernels. > > 2. If it's correct to check some arbitrary field (zc.reserved) to be > > 0, then it should be fine to check this for all future fields >= > > reserved in the struct. So some advanced userspace down the line > > doesn't get confused. > > > > Strictly speaking, I'm not convinced this is necessary - eg. 64 bytes > > struct right now, suppose userspace of the future gives us 96 bytes of > > which the last 32 are non-zero for some feature or the other. We, in > > the here and now kernel, truncate that length to 64 (as in we only > > copy to kernel those first 64 bytes) and set the returned length to > > 64. The understanding being, any (future, past or present) userspace > > consults the output value; and considers anything byte >= the returned > > len to be untouched by the kernel executing the call (ie. garbage, > > unacted upon). > > > > So, how would this work for old+new userspace on old+new kernel? > > > > A) old+old, new+new: sizes match, no issue > > B) new kernel, old userspace: That's not an issue. We have the > > switch(len) statement for that. > > C) old kernel, new userspace: that's the 96 vs. 64 B example above - > > new userspace would see that the kernel only operated on 64 B and > > treat the last 32 B as garbage/unacted on. > > > > In this case, we would not give EINVAL on case C, as we would if we > > returned EINVAL on a check_zeroed_user() case for fields past > > zc.reserved. We'd do a zerocopy operating on just the features we know > > about, and communicate to the user that we only acted on features up > > until this byte offset. > > > > Now, given this is the case, we still have the padding confusion with > > zc.reserved and the current struct size, so we have to force it to 0 > > as we are doing. But I think we don't need to go beyond this so far. > > > > Thus, my personal preference is to not have the check_zeroed_user() > > check. But if the consensus demands it, then it's an easy enough fix. > > What are your thoughts? > > > > bpf uses check_zeroed_user to make sure extensions to its structs are > compatible, so yes, this is required. Very well; I shall send out a v3 patch with this. > > Also, you need to address legitimate msg_flags as I mentioned in another > response. I meant to respond to this earlier but forgot. v3 will address this as well. -Arjun
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 42fc5a640df4..8fc09e8638b3 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -357,6 +357,6 @@ struct tcp_zerocopy_receive { __u64 msg_control; /* ancillary data */ __u64 msg_controllen; __u32 msg_flags; - /* __u32 hole; Next we must add >1 u32 otherwise length checks fail. */ + __u32 reserved; /* set to 0 for now */ }; #endif /* _UAPI_LINUX_TCP_H */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e1a17c6b473c..c8469c579ed8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level, } if (copy_from_user(&zc, optval, len)) return -EFAULT; + if (zc.reserved) + return -EINVAL; lock_sock(sk); err = tcp_zerocopy_receive(sk, &zc, &tss); release_sock(sk);