From patchwork Mon Jun 29 20:26:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davide Caratti X-Patchwork-Id: 216903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.1 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB03BC433E0 for ; Mon, 29 Jun 2020 20:27:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7551620656 for ; Mon, 29 Jun 2020 20:27:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JILnGdUH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389122AbgF2U1J (ORCPT ); Mon, 29 Jun 2020 16:27:09 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:57622 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2389113AbgF2U1G (ORCPT ); Mon, 29 Jun 2020 16:27:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593462425; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SHdcTU88OBHqxaq6FPhno3ONFEavy+uwI066ETqm9AQ=; b=JILnGdUHkk5cikIJDDud3wpYu7rn0/g+5BMljV60QOzqyGBP4xWf+6ef5aXNKKGP0g1RWp kiwDske/LhtPVO/B1fPUCaTH9xzCt6NAmxv+bhR+n3nCo9Qp5ojKv9fij16aWKhrgP08t9 SFOSh7EhnvjV6f+kXpWQcjYsAX1e1rg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-44-iAvh-b52MFWzV372mIXB6Q-1; Mon, 29 Jun 2020 16:26:50 -0400 X-MC-Unique: iAvh-b52MFWzV372mIXB6Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8F87E1932481; Mon, 29 Jun 2020 20:26:48 +0000 (UTC) Received: from new-host-5.redhat.com (unknown [10.40.194.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 019A91C4; Mon, 29 Jun 2020 20:26:46 +0000 (UTC) From: Davide Caratti To: Mat Martineau , Matthieu Baerts , "David S. Miller" , Jakub Kicinski , netdev@vger.kernel.org, mptcp@lists.01.org Subject: [PATCH net-next 2/6] mptcp: fallback in case of simultaneous connect Date: Mon, 29 Jun 2020 22:26:21 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org when a MPTCP client tries to connect to itself, tcp_finish_connect() is never reached. Because of this, depending on the socket current state, multiple faulty behaviours can be observed: 1) a WARN_ON() in subflow_data_ready() is hit WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230 [...] CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187 [...] RIP: 0010:subflow_data_ready+0x18b/0x230 [...] Call Trace: tcp_data_queue+0xd2f/0x4250 tcp_rcv_state_process+0xb1c/0x49d3 tcp_v4_do_rcv+0x2bc/0x790 __release_sock+0x153/0x2d0 release_sock+0x4f/0x170 mptcp_shutdown+0x167/0x4e0 __sys_shutdown+0xe6/0x180 __x64_sys_shutdown+0x50/0x70 do_syscall_64+0x9a/0x370 entry_SYSCALL_64_after_hwframe+0x44/0xa9 2) client is stuck forever in mptcp_sendmsg() because the socket is not TCP_ESTABLISHED crash> bt 4847 PID: 4847 TASK: ffff88814b2fb100 CPU: 1 COMMAND: "gh35" #0 [ffff8881376ff680] __schedule at ffffffff97248da4 #1 [ffff8881376ff778] schedule at ffffffff9724a34f #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0 #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859 #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52 #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26 #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c RIP: 00007f126f6956ed RSP: 00007ffc2a320278 RFLAGS: 00000217 RAX: ffffffffffffffda RBX: 0000000020000044 RCX: 00007f126f6956ed RDX: 0000000000000004 RSI: 00000000004007b8 RDI: 0000000000000003 RBP: 00007ffc2a3202a0 R8: 0000000000400720 R9: 0000000000400720 R10: 0000000000400720 R11: 0000000000000217 R12: 00000000004004b0 R13: 00007ffc2a320380 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b 3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake didn't complete. $ tcpdump -tnnr bad.pcap IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0 force a fallback to TCP in these cases, and adjust the main socket state to avoid hanging in mptcp_sendmsg(). Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/35 Reported-by: Christoph Paasch Suggested-by: Paolo Abeni Signed-off-by: Davide Caratti --- net/mptcp/protocol.h | 10 ++++++++++ net/mptcp/subflow.c | 10 ++++++++++ 2 files changed, 20 insertions(+) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a709df659ae0..1d05d9841b5c 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -490,4 +490,14 @@ static inline void mptcp_do_fallback(struct sock *sk) #define pr_fallback(a) pr_debug("%s:fallback to TCP (msk=%p)", __func__, a) +static inline bool subflow_simultaneous_connect(struct sock *sk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct sock *parent = subflow->conn; + + return sk->sk_state == TCP_ESTABLISHED && + !mptcp_sk(parent)->pm.server_side && + !subflow->conn_finished; +} + #endif /* __MPTCP_PROTOCOL_H */ diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index cb8a42ff4646..548f9e347ff5 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1128,6 +1128,16 @@ static void subflow_state_change(struct sock *sk) __subflow_state_change(sk); + if (subflow_simultaneous_connect(sk)) { + mptcp_do_fallback(sk); + pr_fallback(mptcp_sk(parent)); + subflow->conn_finished = 1; + if (inet_sk_state_load(parent) == TCP_SYN_SENT) { + inet_sk_state_store(parent, TCP_ESTABLISHED); + parent->sk_state_change(parent); + } + } + /* as recvmsg() does not acquire the subflow socket for ssk selection * a fin packet carrying a DSS can be unnoticed if we don't trigger * the data available machinery here. From patchwork Mon Jun 29 20:26:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davide Caratti X-Patchwork-Id: 216902 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0034DC433DF for ; Mon, 29 Jun 2020 20:27:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C827420656 for ; Mon, 29 Jun 2020 20:27:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CSxsXTL0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731947AbgF2U1N (ORCPT ); Mon, 29 Jun 2020 16:27:13 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:49380 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2389106AbgF2U1D (ORCPT ); Mon, 29 Jun 2020 16:27:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593462420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xZ0lrRYkxAl0RydCohzMlCIYdwH9VivECB57kRHXAX0=; b=CSxsXTL0QoGDQwwAKDSyazw77X4BZCExSxrbKzpQIlJl1F8kyaMDvFgkKo7byZ++7G+AfT kmJCvVSOc2lRXkTSkihSpSbFRnF/OtYawAXnuurv1hdMIcDjgtcG5yBQWD3ufIn7YbKi3E iJ9e3hxrV/KhRm1D6VVoKcsUmC/83+4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-173-FTEP9M3WMN2p2MVq31UheQ-1; Mon, 29 Jun 2020 16:26:51 -0400 X-MC-Unique: FTEP9M3WMN2p2MVq31UheQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 88E0D107ACCA; Mon, 29 Jun 2020 20:26:50 +0000 (UTC) Received: from new-host-5.redhat.com (unknown [10.40.194.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id EAEE11C4; Mon, 29 Jun 2020 20:26:48 +0000 (UTC) From: Davide Caratti To: Mat Martineau , Matthieu Baerts , "David S. Miller" , Jakub Kicinski , netdev@vger.kernel.org, mptcp@lists.01.org Subject: [PATCH net-next 3/6] mptcp: check for plain TCP sock at accept time Date: Mon, 29 Jun 2020 22:26:22 +0200 Message-Id: <821ad8ad1581b906b6706426c1ba0c3b837cf60d.1593461586.git.dcaratti@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Paolo Abeni This cleanup the code a bit and avoid corrupted states on weird syscall sequence (accept(), connect()). Signed-off-by: Paolo Abeni Signed-off-by: Davide Caratti --- net/mptcp/protocol.c | 69 +++++--------------------------------------- 1 file changed, 7 insertions(+), 62 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 84ae96be9837..dbeb6fe374f5 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -52,13 +52,10 @@ static struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk) return msk->subflow; } -static struct socket *mptcp_is_tcpsk(struct sock *sk) +static bool mptcp_is_tcpsk(struct sock *sk) { struct socket *sock = sk->sk_socket; - if (sock->sk != sk) - return NULL; - if (unlikely(sk->sk_prot == &tcp_prot)) { /* we are being invoked after mptcp_accept() has * accepted a non-mp-capable flow: sk is a tcp_sk, @@ -68,27 +65,21 @@ static struct socket *mptcp_is_tcpsk(struct sock *sk) * bypass mptcp. */ sock->ops = &inet_stream_ops; - return sock; + return true; #if IS_ENABLED(CONFIG_MPTCP_IPV6) } else if (unlikely(sk->sk_prot == &tcpv6_prot)) { sock->ops = &inet6_stream_ops; - return sock; + return true; #endif } - return NULL; + return false; } static struct socket *__mptcp_tcp_fallback(struct mptcp_sock *msk) { - struct socket *sock; - sock_owned_by_me((const struct sock *)msk); - sock = mptcp_is_tcpsk((struct sock *)msk); - if (unlikely(sock)) - return sock; - if (likely(!__mptcp_check_fallback(msk))) return NULL; @@ -1466,7 +1457,6 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, return NULL; pr_debug("msk=%p, subflow is mptcp=%d", msk, sk_is_mptcp(newsk)); - if (sk_is_mptcp(newsk)) { struct mptcp_subflow_context *subflow; struct sock *new_mptcp_sock; @@ -1821,42 +1811,6 @@ static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr, return err; } -static int mptcp_v4_getname(struct socket *sock, struct sockaddr *uaddr, - int peer) -{ - if (sock->sk->sk_prot == &tcp_prot) { - /* we are being invoked from __sys_accept4, after - * mptcp_accept() has just accepted a non-mp-capable - * flow: sk is a tcp_sk, not an mptcp one. - * - * Hand the socket over to tcp so all further socket ops - * bypass mptcp. - */ - sock->ops = &inet_stream_ops; - } - - return inet_getname(sock, uaddr, peer); -} - -#if IS_ENABLED(CONFIG_MPTCP_IPV6) -static int mptcp_v6_getname(struct socket *sock, struct sockaddr *uaddr, - int peer) -{ - if (sock->sk->sk_prot == &tcpv6_prot) { - /* we are being invoked from __sys_accept4 after - * mptcp_accept() has accepted a non-mp-capable - * subflow: sk is a tcp_sk, not mptcp. - * - * Hand the socket over to tcp so all further - * socket ops bypass mptcp. - */ - sock->ops = &inet6_stream_ops; - } - - return inet6_getname(sock, uaddr, peer); -} -#endif - static int mptcp_listen(struct socket *sock, int backlog) { struct mptcp_sock *msk = mptcp_sk(sock->sk); @@ -1885,15 +1839,6 @@ static int mptcp_listen(struct socket *sock, int backlog) return err; } -static bool is_tcp_proto(const struct proto *p) -{ -#if IS_ENABLED(CONFIG_MPTCP_IPV6) - return p == &tcp_prot || p == &tcpv6_prot; -#else - return p == &tcp_prot; -#endif -} - static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, int flags, bool kern) { @@ -1915,7 +1860,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, release_sock(sock->sk); err = ssock->ops->accept(sock, newsock, flags, kern); - if (err == 0 && !is_tcp_proto(newsock->sk->sk_prot)) { + if (err == 0 && !mptcp_is_tcpsk(newsock->sk)) { struct mptcp_sock *msk = mptcp_sk(newsock->sk); struct mptcp_subflow_context *subflow; @@ -2011,7 +1956,7 @@ static const struct proto_ops mptcp_stream_ops = { .connect = mptcp_stream_connect, .socketpair = sock_no_socketpair, .accept = mptcp_stream_accept, - .getname = mptcp_v4_getname, + .getname = inet_getname, .poll = mptcp_poll, .ioctl = inet_ioctl, .gettstamp = sock_gettstamp, @@ -2065,7 +2010,7 @@ static const struct proto_ops mptcp_v6_stream_ops = { .connect = mptcp_stream_connect, .socketpair = sock_no_socketpair, .accept = mptcp_stream_accept, - .getname = mptcp_v6_getname, + .getname = inet6_getname, .poll = mptcp_poll, .ioctl = inet6_ioctl, .gettstamp = sock_gettstamp,