Message ID | 20210726153621.2658658-1-gregkh@linuxfoundation.org |
---|---|
State | New |
Headers | show |
Series | [net] af_unix: fix garbage collect vs. MSG_PEEK | expand |
On Mon, Jul 26, 2021 at 05:36:21PM +0200, Greg Kroah-Hartman wrote: > From: Miklos Szeredi <mszeredi@redhat.com> > > Gc assumes that in-flight sockets that don't have an external ref can't I think this commit log could be expanded. I had to really study things to even beging to understand what was going on. I assume "Gc" here means specifically unix_gc()? > gain one while unix_gc_lock is held. That is true because > unix_notinflight() will be called before detaching fds, which takes > unix_gc_lock. In reading the code, I *think* what is being protected by unix_gc_lock is user->unix_inflight, u->inflight, unix_tot_inflight, and gc_inflight_list? I note that unix_tot_inflight isn't an atomic but is read outside of locking by unix_release_sock() and wait_for_unix_gc(), which seems wrong (or at least inefficient). But regardless, are the "external references" the f_count (i.e. get_file() of u->sk.sk_socket->file) being changed by scm_fp_dup() and read by unix_gc() (i.e. file_count())? It seems the test in unix_gc() is for the making sure f_count isn't out of sync with u->inflight (is this the corresponding "internal" reference?): total_refs = file_count(u->sk.sk_socket->file); inflight_refs = atomic_long_read(&u->inflight); BUG_ON(inflight_refs < 1); BUG_ON(total_refs < inflight_refs); if (total_refs == inflight_refs) { > Only MSG_PEEK was somehow overlooked. That one also clones the fds, also > keeping them in the skb. But through MSG_PEEK an external reference can > definitely be gained without ever touching unix_gc_lock. The idea appears to be that all scm_fp_dup() callers need to refresh the u->inflight counts which is what unix_attach_fds() and unix_detach_fds() do. Why is lock/unlock sufficient for unix_peek_fds()? I assume the rationale is because MSG_PEEK uses a temporary scm, which only gets fput() clean-up on destroy ("inflight" is neither incremented nor decremented at any point in the scm lifetime). But I don't see why any of this helps. unix_attach_fds(): fget(), spin_lock(), inflight++, spin_unlock() unix_detach_fds(): spin_lock(), inflight--, spin_unlock(), fput() unix_peek_fds(): fget(), spin_lock(), spin_unlock() unix_gx(): spin_lock(), "total_refs == inflight_refs" to hitlist, spin_unlock(), free hitlist skbs Doesn't this mean total_refs and inflight_refs can still get out of sync? What keeps an skb from being "visible" to unix_peek_fds() between the unix_gx() spin_unlock() and the unix_peek_fds() fget()? A: unix_gx(): spin_lock() find "total_refs == inflight_refs", add to hitlist spin_unlock() B: unix_peek_fds(): fget() A: unix_gc(): walk hitlist and free(skb) B: unix_peek_fds(): *use freed skb* I feel like I must be missing something since the above race would appear to exist even for unix_attach_fds()/unix_detach_fds(): A: unix_gx(): spin_lock() find "total_refs == inflight_refs", add to hitlist spin_unlock() B: unix_attach_fds(): fget() A: unix_gc(): walk hitlist and free(skb) B: unix_attach_fds(): *use freed skb* I'm assuming I'm missing a top-level usage count on skb that is held by callers, which means the skb isn't actually freed by unix_gc(). But I return to not understanding why adding the lock/unlock helps. What are the expected locking semantics here? -Kees > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> > Cc: <stable@vger.kernel.org> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > --- > net/unix/af_unix.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > Note, this is a resend of this old submission that somehow fell through > the cracks: > https://lore.kernel.org/netdev/CAOssrKcfncAYsQWkfLGFgoOxAQJVT2hYVWdBA6Cw7hhO8RJ_wQ@mail.gmail.com/ > and was never submitted "properly" and this issue never seemed to get > resolved properly. > > I've cleaned it up and made the change much smaller and localized to > only one file. I kept Miklos's authorship as he did the hard work on > this, I just removed lines and fixed a formatting issue :) > > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 23c92ad15c61..cdea997aa5bf 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -1526,6 +1526,18 @@ static int unix_getname(struct socket *sock, struct sockaddr *uaddr, int peer) > return err; > } > > +static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) > +{ > + scm->fp = scm_fp_dup(UNIXCB(skb).fp); > + > + /* During garbage collection it is assumed that in-flight sockets don't > + * get a new external reference. So we need to wait until current run > + * finishes. > + */ > + spin_lock(&unix_gc_lock); > + spin_unlock(&unix_gc_lock); > +}
On Mon, Jul 26, 2021 at 9:27 PM Kees Cook <keescook@chromium.org> wrote: > > On Mon, Jul 26, 2021 at 05:36:21PM +0200, Greg Kroah-Hartman wrote: > > From: Miklos Szeredi <mszeredi@redhat.com> > > > > Gc assumes that in-flight sockets that don't have an external ref can't > > I think this commit log could be expanded. I had to really study things > to even beging to understand what was going on. I assume "Gc" here means > specifically unix_gc()? Yeah, the original description was not too good. Commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK") now in Linus' tree has a much expanded description. > I note that unix_tot_inflight isn't an atomic but is read outside of > locking by unix_release_sock() and wait_for_unix_gc(), which seems wrong > (or at least inefficient). I don't think it matters in practice. Do you have specific worries? > Doesn't this mean total_refs and inflight_refs can still get out of > sync? What keeps an skb from being "visible" to unix_peek_fds() between > the unix_gx() spin_unlock() and the unix_peek_fds() fget()? > > A: unix_gx(): > spin_lock() > find "total_refs == inflight_refs", add to hitlist > spin_unlock() > B: unix_peek_fds(): > fget() > A: unix_gc(): > walk hitlist and free(skb) > B: unix_peek_fds(): > *use freed skb* > > I feel like I must be missing something since the above race would > appear to exist even for unix_attach_fds()/unix_detach_fds(): What you are missing is that anything that could have been peeked must not have been garbage collected. I.e. the garbage collection algorithm will find that there's an external in-flight reference to the peeked socket and so it will not add it to the hitlist. Thanks, Miklos
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 23c92ad15c61..cdea997aa5bf 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1526,6 +1526,18 @@ static int unix_getname(struct socket *sock, struct sockaddr *uaddr, int peer) return err; } +static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + scm->fp = scm_fp_dup(UNIXCB(skb).fp); + + /* During garbage collection it is assumed that in-flight sockets don't + * get a new external reference. So we need to wait until current run + * finishes. + */ + spin_lock(&unix_gc_lock); + spin_unlock(&unix_gc_lock); +} + static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool send_fds) { int err = 0; @@ -2175,7 +2187,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, sk_peek_offset_fwd(sk, size); if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); } err = (flags & MSG_TRUNC) ? skb->len - skip : size; @@ -2418,7 +2430,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state, /* It is questionable, see note in unix_dgram_recvmsg. */ if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); sk_peek_offset_fwd(sk, chunk);