From patchwork Thu Mar 26 20:46:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 221768 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53CBCC43331 for ; Thu, 26 Mar 2020 20:47:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 27AB42070A for ; Thu, 26 Mar 2020 20:47:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727867AbgCZUra (ORCPT ); Thu, 26 Mar 2020 16:47:30 -0400 Received: from mga05.intel.com ([192.55.52.43]:47906 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727711AbgCZUrC (ORCPT ); Thu, 26 Mar 2020 16:47:02 -0400 IronPort-SDR: xFvkfM/gimuu4Va/ebNhRTfwKALiA9j9N4pnNAvGiYyfpO+gY6LdpK2D9ecQYCVJ8jRcwYmyGr vAJfaSeOxwjg== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Mar 2020 13:47:00 -0700 IronPort-SDR: NIIeUa5NRD2gMiyrlWymv9fwO2RIxMUL1HmOsQtYmyB6/GEHuWA7w3c1xluw89h19PdtTcMZsO TH8JRgtdf7eQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,309,1580803200"; d="scan'208";a="238911669" Received: from mjmartin-nuc02.mjmartin-nuc02 (HELO mjmartin-nuc02.sea.intel.com) ([10.252.133.119]) by fmsmga007.fm.intel.com with ESMTP; 26 Mar 2020 13:47:00 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Peter Krystad , eric.dumazet@gmail.com, Florian Westphal , Paolo Abeni , Mat Martineau Subject: [PATCH net-next v2 05/17] mptcp: Implement path manager interface commands Date: Thu, 26 Mar 2020 13:46:28 -0700 Message-Id: <20200326204640.67336-6-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> References: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Peter Krystad Fill in more path manager functionality by adding a worker function and modifying the related stub functions to schedule the worker. Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Peter Krystad Signed-off-by: Mat Martineau --- net/mptcp/pm.c | 132 +++++++++++++++++++++++++++++++++++++++++-- net/mptcp/protocol.c | 1 + net/mptcp/protocol.h | 1 + 3 files changed, 129 insertions(+), 5 deletions(-) diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index ad837da0193d..3aedad58778c 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -15,7 +15,11 @@ static struct workqueue_struct *pm_wq; int mptcp_pm_announce_addr(struct mptcp_sock *msk, const struct mptcp_addr_info *addr) { - return -ENOTSUPP; + pr_debug("msk=%p, local_id=%d", msk, addr->id); + + msk->pm.local = *addr; + WRITE_ONCE(msk->pm.addr_signal, true); + return 0; } int mptcp_pm_remove_addr(struct mptcp_sock *msk, u8 local_id) @@ -41,13 +45,58 @@ void mptcp_pm_new_connection(struct mptcp_sock *msk, int server_side) bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk) { - pr_debug("msk=%p", msk); - return false; + struct mptcp_pm_data *pm = &msk->pm; + int ret; + + pr_debug("msk=%p subflows=%d max=%d allow=%d", msk, pm->subflows, + pm->subflows_max, READ_ONCE(pm->accept_subflow)); + + /* try to avoid acquiring the lock below */ + if (!READ_ONCE(pm->accept_subflow)) + return false; + + spin_lock_bh(&pm->lock); + ret = pm->subflows < pm->subflows_max; + if (ret && ++pm->subflows == pm->subflows_max) + WRITE_ONCE(pm->accept_subflow, false); + spin_unlock_bh(&pm->lock); + + return ret; +} + +/* return true if the new status bit is currently cleared, that is, this event + * can be server, eventually by an already scheduled work + */ +static bool mptcp_pm_schedule_work(struct mptcp_sock *msk, + enum mptcp_pm_status new_status) +{ + pr_debug("msk=%p status=%x new=%lx", msk, msk->pm.status, + BIT(new_status)); + if (msk->pm.status & BIT(new_status)) + return false; + + msk->pm.status |= BIT(new_status); + if (queue_work(pm_wq, &msk->pm.work)) + sock_hold((struct sock *)msk); + return true; } void mptcp_pm_fully_established(struct mptcp_sock *msk) { + struct mptcp_pm_data *pm = &msk->pm; + pr_debug("msk=%p", msk); + + /* try to avoid acquiring the lock below */ + if (!READ_ONCE(pm->work_pending)) + return; + + spin_lock_bh(&pm->lock); + + if (READ_ONCE(pm->work_pending)) + mptcp_pm_schedule_work(msk, MPTCP_PM_ESTABLISHED); + + spin_unlock_bh(&pm->lock); } void mptcp_pm_connection_closed(struct mptcp_sock *msk) @@ -58,7 +107,19 @@ void mptcp_pm_connection_closed(struct mptcp_sock *msk) void mptcp_pm_subflow_established(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow) { + struct mptcp_pm_data *pm = &msk->pm; + pr_debug("msk=%p", msk); + + if (!READ_ONCE(pm->work_pending)) + return; + + spin_lock_bh(&pm->lock); + + if (READ_ONCE(pm->work_pending)) + mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED); + + spin_unlock_bh(&pm->lock); } void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id) @@ -69,7 +130,23 @@ void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id) void mptcp_pm_add_addr_received(struct mptcp_sock *msk, const struct mptcp_addr_info *addr) { - pr_debug("msk=%p, remote_id=%d", msk, addr->id); + struct mptcp_pm_data *pm = &msk->pm; + + pr_debug("msk=%p remote_id=%d accept=%d", msk, addr->id, + READ_ONCE(pm->accept_addr)); + + /* avoid acquiring the lock if there is no room for fouther addresses */ + if (!READ_ONCE(pm->accept_addr)) + return; + + spin_lock_bh(&pm->lock); + + /* be sure there is something to signal re-checking under PM lock */ + if (READ_ONCE(pm->accept_addr) && + mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_RECEIVED)) + pm->remote = *addr; + + spin_unlock_bh(&pm->lock); } /* path manager helpers */ @@ -77,7 +154,24 @@ void mptcp_pm_add_addr_received(struct mptcp_sock *msk, bool mptcp_pm_addr_signal(struct mptcp_sock *msk, unsigned int remaining, struct mptcp_addr_info *saddr) { - return false; + int ret = false; + + spin_lock_bh(&msk->pm.lock); + + /* double check after the lock is acquired */ + if (!mptcp_pm_should_signal(msk)) + goto out_unlock; + + if (remaining < mptcp_add_addr_len(msk->pm.local.family)) + goto out_unlock; + + *saddr = msk->pm.local; + WRITE_ONCE(msk->pm.addr_signal, false); + ret = true; + +out_unlock: + spin_unlock_bh(&msk->pm.lock); + return ret; } int mptcp_pm_get_local_id(struct mptcp_sock *msk, struct sock_common *skc) @@ -87,6 +181,28 @@ int mptcp_pm_get_local_id(struct mptcp_sock *msk, struct sock_common *skc) static void pm_worker(struct work_struct *work) { + struct mptcp_pm_data *pm = container_of(work, struct mptcp_pm_data, + work); + struct mptcp_sock *msk = container_of(pm, struct mptcp_sock, pm); + struct sock *sk = (struct sock *)msk; + + lock_sock(sk); + spin_lock_bh(&msk->pm.lock); + + pr_debug("msk=%p status=%x", msk, pm->status); + if (pm->status & BIT(MPTCP_PM_ADD_ADDR_RECEIVED)) { + pm->status &= ~BIT(MPTCP_PM_ADD_ADDR_RECEIVED); + } + if (pm->status & BIT(MPTCP_PM_ESTABLISHED)) { + pm->status &= ~BIT(MPTCP_PM_ESTABLISHED); + } + if (pm->status & BIT(MPTCP_PM_SUBFLOW_ESTABLISHED)) { + pm->status &= ~BIT(MPTCP_PM_SUBFLOW_ESTABLISHED); + } + + spin_unlock_bh(&msk->pm.lock); + release_sock(sk); + sock_put(sk); } void mptcp_pm_data_init(struct mptcp_sock *msk) @@ -105,6 +221,12 @@ void mptcp_pm_data_init(struct mptcp_sock *msk) INIT_WORK(&msk->pm.work, pm_worker); } +void mptcp_pm_close(struct mptcp_sock *msk) +{ + if (cancel_work_sync(&msk->pm.work)) + sock_put((struct sock *)msk); +} + void mptcp_pm_init(void) { pm_wq = alloc_workqueue("pm_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, 8); diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 3d84e0b83c99..5c4560287bd2 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -833,6 +833,7 @@ static void mptcp_close(struct sock *sk, long timeout) } mptcp_cancel_work(sk); + mptcp_pm_close(msk); __skb_queue_purge(&sk->sk_receive_queue); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index df134ac91274..209bdaa43dda 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -330,6 +330,7 @@ void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u8 *msg, int len, void *hmac); void mptcp_pm_init(void); void mptcp_pm_data_init(struct mptcp_sock *msk); +void mptcp_pm_close(struct mptcp_sock *msk); void mptcp_pm_new_connection(struct mptcp_sock *msk, int server_side); void mptcp_pm_fully_established(struct mptcp_sock *msk); bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk); From patchwork Thu Mar 26 20:46:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 221775 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF250C43331 for ; Thu, 26 Mar 2020 20:47:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B9FDC2070A for ; Thu, 26 Mar 2020 20:47:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727787AbgCZUrE (ORCPT ); Thu, 26 Mar 2020 16:47:04 -0400 Received: from mga05.intel.com ([192.55.52.43]:47906 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727724AbgCZUrD (ORCPT ); Thu, 26 Mar 2020 16:47:03 -0400 IronPort-SDR: Iq/YvNqAjVHrWcjykePyQizC+Ely/riHx2bUfnRERFmhjZ8Xh0oWE9GbSN14Y7hOXHCTM0A5SP Bp1hG+VS4JAQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Mar 2020 13:47:01 -0700 IronPort-SDR: F1287YxRBe/DM/BxA0byNhW4+GFS+1g68sDH3g30b050m7W9GQPEtEfmJPwmrzfkKMyNky0EMK i7eOu2rUoTow== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,309,1580803200"; d="scan'208";a="238911681" Received: from mjmartin-nuc02.mjmartin-nuc02 (HELO mjmartin-nuc02.sea.intel.com) ([10.252.133.119]) by fmsmga007.fm.intel.com with ESMTP; 26 Mar 2020 13:47:01 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , eric.dumazet@gmail.com, Mat Martineau Subject: [PATCH net-next v2 08/17] mptcp: introduce MPTCP retransmission timer Date: Thu, 26 Mar 2020 13:46:31 -0700 Message-Id: <20200326204640.67336-9-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> References: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Paolo Abeni The timer will be used to schedule retransmission. It's frequency is based on the current subflow RTO estimation and is reset on every una_seq update The timer is clearer for good by __mptcp_clear_xmit() Also clean MPTCP rtx queue before each transmission. Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/options.c | 4 +- net/mptcp/protocol.c | 89 +++++++++++++++++++++++++++++++++++++++++++- net/mptcp/protocol.h | 2 + 3 files changed, 93 insertions(+), 2 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index b0ff8ad702a3..bd220ee4aac9 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -779,8 +779,10 @@ static void update_una(struct mptcp_sock *msk, snd_una = old_snd_una; old_snd_una = atomic64_cmpxchg(&msk->snd_una, snd_una, new_snd_una); - if (old_snd_una == snd_una) + if (old_snd_una == snd_una) { + mptcp_data_acked((struct sock *)msk); break; + } } } diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index dae5d3152da4..35eae3fdcf04 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -251,6 +251,46 @@ static void __mptcp_flush_join_list(struct mptcp_sock *msk) spin_unlock_bh(&msk->join_list_lock); } +static void mptcp_set_timeout(const struct sock *sk, const struct sock *ssk) +{ + long tout = ssk && inet_csk(ssk)->icsk_pending ? + inet_csk(ssk)->icsk_timeout - jiffies : 0; + + if (tout <= 0) + tout = mptcp_sk(sk)->timer_ival; + mptcp_sk(sk)->timer_ival = tout > 0 ? tout : TCP_RTO_MIN; +} + +static bool mptcp_timer_pending(struct sock *sk) +{ + return timer_pending(&inet_csk(sk)->icsk_retransmit_timer); +} + +static void mptcp_reset_timer(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + unsigned long tout; + + /* should never be called with mptcp level timer cleared */ + tout = READ_ONCE(mptcp_sk(sk)->timer_ival); + if (WARN_ON_ONCE(!tout)) + tout = TCP_RTO_MIN; + sk_reset_timer(sk, &icsk->icsk_retransmit_timer, jiffies + tout); +} + +void mptcp_data_acked(struct sock *sk) +{ + mptcp_reset_timer(sk); +} + +static void mptcp_stop_timer(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + sk_stop_timer(sk, &icsk->icsk_retransmit_timer); + mptcp_sk(sk)->timer_ival = 0; +} + static bool mptcp_ext_cache_refill(struct mptcp_sock *msk) { if (!msk->cached_ext) @@ -596,10 +636,15 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) copied += ret; } + mptcp_set_timeout(sk, ssk); if (copied) { ret = copied; tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle, size_goal); + + /* start the timer, if it's not pending */ + if (!mptcp_timer_pending(sk)) + mptcp_reset_timer(sk); } ssk_check_wmem(msk, ssk); @@ -787,6 +832,35 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, return copied; } +static void mptcp_retransmit_handler(struct sock *sk) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + + if (atomic64_read(&msk->snd_una) == msk->write_seq) + mptcp_stop_timer(sk); + else + mptcp_reset_timer(sk); +} + +static void mptcp_retransmit_timer(struct timer_list *t) +{ + struct inet_connection_sock *icsk = from_timer(icsk, t, + icsk_retransmit_timer); + struct sock *sk = &icsk->icsk_inet.sk; + + bh_lock_sock(sk); + if (!sock_owned_by_user(sk)) { + mptcp_retransmit_handler(sk); + } else { + /* delegate our work to tcp_release_cb() */ + if (!test_and_set_bit(TCP_WRITE_TIMER_DEFERRED, + &sk->sk_tsq_flags)) + sock_hold(sk); + } + bh_unlock_sock(sk); + sock_put(sk); +} + /* subflow sockets can be either outgoing (connect) or incoming * (accept). * @@ -846,6 +920,9 @@ static int __mptcp_init_sock(struct sock *sk) mptcp_pm_data_init(msk); + /* re-use the csk retrans timer for MPTCP-level retrans */ + timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0); + return 0; } @@ -867,6 +944,8 @@ static void __mptcp_clear_xmit(struct sock *sk) struct mptcp_sock *msk = mptcp_sk(sk); struct mptcp_data_frag *dtmp, *dfrag; + sk_stop_timer(sk, &msk->sk.icsk_retransmit_timer); + list_for_each_entry_safe(dfrag, dtmp, &msk->rtx_queue, list) dfrag_clear(dfrag); } @@ -1155,7 +1234,8 @@ static int mptcp_getsockopt(struct sock *sk, int level, int optname, return -EOPNOTSUPP; } -#define MPTCP_DEFERRED_ALL TCPF_DELACK_TIMER_DEFERRED +#define MPTCP_DEFERRED_ALL (TCPF_DELACK_TIMER_DEFERRED | \ + TCPF_WRITE_TIMER_DEFERRED) /* this is very alike tcp_release_cb() but we must handle differently a * different set of events @@ -1171,6 +1251,8 @@ static void mptcp_release_cb(struct sock *sk) nflags = flags & ~MPTCP_DEFERRED_ALL; } while (cmpxchg(&sk->sk_tsq_flags, flags, nflags) != flags); + sock_release_ownership(sk); + if (flags & TCPF_DELACK_TIMER_DEFERRED) { struct mptcp_sock *msk = mptcp_sk(sk); struct sock *ssk; @@ -1179,6 +1261,11 @@ static void mptcp_release_cb(struct sock *sk) if (!ssk || !schedule_work(&msk->work)) __sock_put(sk); } + + if (flags & TCPF_WRITE_TIMER_DEFERRED) { + mptcp_retransmit_handler(sk); + __sock_put(sk); + } } static int mptcp_get_port(struct sock *sk, unsigned short snum) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a1fdb879259a..d222eea11922 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -157,6 +157,7 @@ struct mptcp_sock { u64 write_seq; u64 ack_seq; atomic64_t snd_una; + unsigned long timer_ival; u32 token; unsigned long flags; bool can_ack; @@ -326,6 +327,7 @@ void mptcp_get_options(const struct sk_buff *skb, void mptcp_finish_connect(struct sock *sk); void mptcp_data_ready(struct sock *sk, struct sock *ssk); bool mptcp_finish_join(struct sock *sk); +void mptcp_data_acked(struct sock *sk); int mptcp_token_new_request(struct request_sock *req); void mptcp_token_destroy_request(u32 token); From patchwork Thu Mar 26 20:46:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 221769 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEFBEC2D0ED for ; Thu, 26 Mar 2020 20:47:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CDA282070A for ; Thu, 26 Mar 2020 20:47:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727856AbgCZUr0 (ORCPT ); Thu, 26 Mar 2020 16:47:26 -0400 Received: from mga05.intel.com ([192.55.52.43]:47904 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727738AbgCZUrD (ORCPT ); Thu, 26 Mar 2020 16:47:03 -0400 IronPort-SDR: a/zsGV6DdIO3vTnrgVXelyHcdxdgksFKTmWv8124rKL+b8M0Q89y05Yu3nzS+KTI7ki+0CHwG4 dp8MSXmRkQrA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Mar 2020 13:47:02 -0700 IronPort-SDR: nEUzsDk4jQB9TAINkOvMYIcBucZpgzHE8pTF4WDRENxi8OQF0n9qU2rRAKM6DQNFOXW5QCS791 c/kU788XTzrg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,309,1580803200"; d="scan'208";a="238911688" Received: from mjmartin-nuc02.mjmartin-nuc02 (HELO mjmartin-nuc02.sea.intel.com) ([10.252.133.119]) by fmsmga007.fm.intel.com with ESMTP; 26 Mar 2020 13:47:02 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Florian Westphal , eric.dumazet@gmail.com, Mat Martineau Subject: [PATCH net-next v2 10/17] mptcp: allow partial cleaning of rtx head dfrag Date: Thu, 26 Mar 2020 13:46:33 -0700 Message-Id: <20200326204640.67336-11-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> References: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Florian Westphal After adding wmem accounting for the mptcp socket we could get into a situation where the mptcp socket can't transmit more data, and mptcp_clean_una doesn't reduce wmem even if snd_una has advanced because it currently will only remove entire dfrags. Allow advancing the dfrag head sequence and reduce wmem, even though this isn't correct (as we can't release the page). Because we will soon block on mptcp sk in case wmem is too large, call sk_stream_write_space() in case we reduced the backlog so userspace task blocked in sendmsg or poll will be woken up. This isn't an issue if the send buffer is large, but it is when SO_SNDBUF is used to reduce it to a lower value. Note we can still get a deadlock for low SO_SNDBUF values in case both sides of the connection write to the socket: both could be blocked due to wmem being too small -- and current mptcp stack will only increment mptcp ack_seq on recv. This doesn't happen with the selftest as it uses poll() and will always call recv if there is data to read. Signed-off-by: Florian Westphal Signed-off-by: Mat Martineau --- net/mptcp/protocol.c | 16 ++++++++++++++++ net/mptcp/protocol.h | 10 ++++++++++ 2 files changed, 26 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 3a0c0a89a97d..4cc48abc4d9d 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -338,6 +338,7 @@ static inline bool mptcp_frag_can_collapse_to(const struct mptcp_sock *msk, static void dfrag_uncharge(struct sock *sk, int len) { sk_mem_uncharge(sk, len); + sk_wmem_queued_add(sk, -len); } static void dfrag_clear(struct sock *sk, struct mptcp_data_frag *dfrag) @@ -364,8 +365,23 @@ static void mptcp_clean_una(struct sock *sk) cleaned = true; } + dfrag = mptcp_rtx_head(sk); + if (dfrag && after64(snd_una, dfrag->data_seq)) { + u64 delta = dfrag->data_seq + dfrag->data_len - snd_una; + + dfrag->data_seq += delta; + dfrag->data_len -= delta; + + dfrag_uncharge(sk, delta); + cleaned = true; + } + if (cleaned) { sk_mem_reclaim_partial(sk); + + /* Only wake up writers if a subflow is ready */ + if (test_bit(MPTCP_SEND_SPACE, &msk->flags)) + sk_stream_write_space(sk); } } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index d222eea11922..f855c954a8ff 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -190,6 +190,16 @@ static inline struct mptcp_data_frag *mptcp_rtx_tail(const struct sock *sk) return list_last_entry(&msk->rtx_queue, struct mptcp_data_frag, list); } +static inline struct mptcp_data_frag *mptcp_rtx_head(const struct sock *sk) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + + if (list_empty(&msk->rtx_queue)) + return NULL; + + return list_first_entry(&msk->rtx_queue, struct mptcp_data_frag, list); +} + struct mptcp_subflow_request_sock { struct tcp_request_sock sk; u16 mp_capable : 1, From patchwork Thu Mar 26 20:46:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 221770 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA81FC2D0E9 for ; Thu, 26 Mar 2020 20:47:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B8AAF2073E for ; Thu, 26 Mar 2020 20:47:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727851AbgCZUrV (ORCPT ); Thu, 26 Mar 2020 16:47:21 -0400 Received: from mga05.intel.com ([192.55.52.43]:47904 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727752AbgCZUrD (ORCPT ); Thu, 26 Mar 2020 16:47:03 -0400 IronPort-SDR: +Q/vhWlpuD6LDraVoEWgY2c29tJfoeCeudv+lN52mPByA+MIVvZgKwLF4ThUR+HZSXoGyRB0sh NrzHQ5ln6IKA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Mar 2020 13:47:02 -0700 IronPort-SDR: 2QtVMFJOZtpiYuyNxn1vOf9+UaALbE+N96LrzzKkLJVYxiloyZW/l2VBwn7++xo8nh6tZf+Yac qWC+9LPTUqJw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,309,1580803200"; d="scan'208";a="238911691" Received: from mjmartin-nuc02.mjmartin-nuc02 (HELO mjmartin-nuc02.sea.intel.com) ([10.252.133.119]) by fmsmga007.fm.intel.com with ESMTP; 26 Mar 2020 13:47:02 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , eric.dumazet@gmail.com, Florian Westphal , Mat Martineau Subject: [PATCH net-next v2 11/17] mptcp: rework mptcp_sendmsg_frag to accept optional dfrag Date: Thu, 26 Mar 2020 13:46:34 -0700 Message-Id: <20200326204640.67336-12-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> References: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Paolo Abeni This will simplify mptcp-level retransmission implementation in the next patch. If dfrag is provided by the caller, skip kernel space memory allocation and use data and metadata provided by the dfrag itself. Because a peer could ack data at TCP level but refrain from sending mptcp-level ACKs, we could grow the mptcp socket backlog indefinitely. We should thus block mptcp_sendmsg until the peer has acked some of the sent data. In order to be able to do so, increment the mptcp socket wmem_queued counter on memory allocation and decrement it when releasing the memory on mptcp-level ack reception. Because TCP performns sndbuf auto-tuning up to tcp_wmem_max[2], make this the mptcp sk_sndbuf limit. In the future we could add experiment with autotuning as TCP does in tcp_sndbuf_expand(). Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/protocol.c | 119 ++++++++++++++++++++++++++----------------- 1 file changed, 72 insertions(+), 47 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 4cc48abc4d9d..7bdf79f8787a 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -316,7 +316,7 @@ static struct sock *mptcp_subflow_recv_lookup(const struct mptcp_sock *msk) return NULL; } -static inline bool mptcp_skb_can_collapse_to(const struct mptcp_sock *msk, +static inline bool mptcp_skb_can_collapse_to(u64 write_seq, const struct sk_buff *skb, const struct mptcp_ext *mpext) { @@ -324,7 +324,7 @@ static inline bool mptcp_skb_can_collapse_to(const struct mptcp_sock *msk, return false; /* can collapse only if MPTCP level sequence is in order */ - return mpext && mpext->data_seq + mpext->data_len == msk->write_seq; + return mpext && mpext->data_seq + mpext->data_len == write_seq; } static inline bool mptcp_frag_can_collapse_to(const struct mptcp_sock *msk, @@ -417,23 +417,28 @@ mptcp_carve_data_frag(const struct mptcp_sock *msk, struct page_frag *pfrag, } static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, - struct msghdr *msg, long *timeo, int *pmss_now, + struct msghdr *msg, struct mptcp_data_frag *dfrag, + long *timeo, int *pmss_now, int *ps_goal) { int mss_now, avail_size, size_goal, offset, ret, frag_truesize = 0; bool dfrag_collapsed, can_collapse = false; struct mptcp_sock *msk = mptcp_sk(sk); struct mptcp_ext *mpext = NULL; - struct mptcp_data_frag *dfrag; + bool retransmission = !!dfrag; struct sk_buff *skb, *tail; struct page_frag *pfrag; + struct page *page; + u64 *write_seq; size_t psize; /* use the mptcp page cache so that we can easily move the data * from one substream to another, but do per subflow memory accounting + * Note: pfrag is used only !retransmission, but the compiler if + * fooled into a warning if we don't init here */ pfrag = sk_page_frag(sk); - while (!mptcp_page_frag_refill(ssk, pfrag) || + while ((!retransmission && !mptcp_page_frag_refill(ssk, pfrag)) || !mptcp_ext_cache_refill(msk)) { ret = sk_stream_wait_memory(ssk, timeo); if (ret) @@ -447,6 +452,13 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, if (unlikely(__mptcp_needs_tcp_fallback(msk))) return 0; } + if (!retransmission) { + write_seq = &msk->write_seq; + page = pfrag->page; + } else { + write_seq = &dfrag->data_seq; + page = dfrag->page; + } /* compute copy limit */ mss_now = tcp_send_mss(ssk, &size_goal, msg->msg_flags); @@ -464,63 +476,74 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, * SSN association set here */ can_collapse = (size_goal - skb->len > 0) && - mptcp_skb_can_collapse_to(msk, skb, mpext); + mptcp_skb_can_collapse_to(*write_seq, skb, mpext); if (!can_collapse) TCP_SKB_CB(skb)->eor = 1; else avail_size = size_goal - skb->len; } - /* reuse tail pfrag, if possible, or carve a new one from the page - * allocator - */ - dfrag = mptcp_rtx_tail(sk); - offset = pfrag->offset; - dfrag_collapsed = mptcp_frag_can_collapse_to(msk, pfrag, dfrag); - if (!dfrag_collapsed) { - dfrag = mptcp_carve_data_frag(msk, pfrag, offset); + if (!retransmission) { + /* reuse tail pfrag, if possible, or carve a new one from the + * page allocator + */ + dfrag = mptcp_rtx_tail(sk); + offset = pfrag->offset; + dfrag_collapsed = mptcp_frag_can_collapse_to(msk, pfrag, dfrag); + if (!dfrag_collapsed) { + dfrag = mptcp_carve_data_frag(msk, pfrag, offset); + offset = dfrag->offset; + frag_truesize = dfrag->overhead; + } + psize = min_t(size_t, pfrag->size - offset, avail_size); + + /* Copy to page */ + pr_debug("left=%zu", msg_data_left(msg)); + psize = copy_page_from_iter(pfrag->page, offset, + min_t(size_t, msg_data_left(msg), + psize), + &msg->msg_iter); + pr_debug("left=%zu", msg_data_left(msg)); + if (!psize) + return -EINVAL; + + if (!sk_wmem_schedule(sk, psize + dfrag->overhead)) + return -ENOMEM; + } else { offset = dfrag->offset; - frag_truesize = dfrag->overhead; + psize = min_t(size_t, dfrag->data_len, avail_size); } - psize = min_t(size_t, pfrag->size - offset, avail_size); - - /* Copy to page */ - pr_debug("left=%zu", msg_data_left(msg)); - psize = copy_page_from_iter(pfrag->page, offset, - min_t(size_t, msg_data_left(msg), psize), - &msg->msg_iter); - pr_debug("left=%zu", msg_data_left(msg)); - if (!psize) - return -EINVAL; - - if (!sk_wmem_schedule(sk, psize + dfrag->overhead)) - return -ENOMEM; /* tell the TCP stack to delay the push so that we can safely * access the skb after the sendpages call */ - ret = do_tcp_sendpages(ssk, pfrag->page, offset, psize, + ret = do_tcp_sendpages(ssk, page, offset, psize, msg->msg_flags | MSG_SENDPAGE_NOTLAST); if (ret <= 0) return ret; frag_truesize += ret; - if (unlikely(ret < psize)) - iov_iter_revert(&msg->msg_iter, psize - ret); + if (!retransmission) { + if (unlikely(ret < psize)) + iov_iter_revert(&msg->msg_iter, psize - ret); - /* send successful, keep track of sent data for mptcp-level - * retransmission - */ - dfrag->data_len += ret; - if (!dfrag_collapsed) { - get_page(dfrag->page); - list_add_tail(&dfrag->list, &msk->rtx_queue); - } + /* send successful, keep track of sent data for mptcp-level + * retransmission + */ + dfrag->data_len += ret; + if (!dfrag_collapsed) { + get_page(dfrag->page); + list_add_tail(&dfrag->list, &msk->rtx_queue); + sk_wmem_queued_add(sk, frag_truesize); + } else { + sk_wmem_queued_add(sk, ret); + } - /* charge data on mptcp rtx queue to the master socket - * Note: we charge such data both to sk and ssk - */ - sk->sk_forward_alloc -= frag_truesize; + /* charge data on mptcp rtx queue to the master socket + * Note: we charge such data both to sk and ssk + */ + sk->sk_forward_alloc -= frag_truesize; + } /* if the tail skb extension is still the cached one, collapsing * really happened. Note: we can't check for 'same skb' as the sk_buff @@ -539,7 +562,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, msk->cached_ext = NULL; memset(mpext, 0, sizeof(*mpext)); - mpext->data_seq = msk->write_seq; + mpext->data_seq = *write_seq; mpext->subflow_seq = mptcp_subflow_ctx(ssk)->rel_write_seq; mpext->data_len = ret; mpext->use_map = 1; @@ -550,8 +573,9 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, mpext->dsn64); out: - pfrag->offset += frag_truesize; - msk->write_seq += ret; + if (!retransmission) + pfrag->offset += frag_truesize; + *write_seq += ret; mptcp_subflow_ctx(ssk)->rel_write_seq += ret; return ret; @@ -663,7 +687,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) lock_sock(ssk); while (msg_data_left(msg)) { - ret = mptcp_sendmsg_frag(sk, ssk, msg, &timeo, &mss_now, + ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now, &size_goal); if (ret < 0) break; @@ -974,6 +998,7 @@ static int mptcp_init_sock(struct sock *sk) return ret; sk_sockets_allocated_inc(sk); + sk->sk_sndbuf = sock_net(sk)->ipv4.sysctl_tcp_wmem[2]; if (!mptcp_is_enabled(sock_net(sk))) return -ENOPROTOOPT; From patchwork Thu Mar 26 20:46:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 221771 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 292F4C2D0ED for ; Thu, 26 Mar 2020 20:47:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0914B2070A for ; Thu, 26 Mar 2020 20:47:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727846AbgCZUrR (ORCPT ); Thu, 26 Mar 2020 16:47:17 -0400 Received: from mga05.intel.com ([192.55.52.43]:47909 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727763AbgCZUrE (ORCPT ); Thu, 26 Mar 2020 16:47:04 -0400 IronPort-SDR: pPHLIwGjVzALQQ6djoD49/bF9IowKFEOxv0x1ouaHvC7Pk7hmBtfGb1a17QtQjQ60ZKUP2psCQ DugLnP7a3IMA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Mar 2020 13:47:03 -0700 IronPort-SDR: reD+oAAqF170CPvWM/w+FVb/pb23+wgrdhsWeYPkdmgV8UQFRdDD08l4BjEMppqdfKG5OtcLc6 Gnh89PiKuMlA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,309,1580803200"; d="scan'208";a="238911693" Received: from mjmartin-nuc02.mjmartin-nuc02 (HELO mjmartin-nuc02.sea.intel.com) ([10.252.133.119]) by fmsmga007.fm.intel.com with ESMTP; 26 Mar 2020 13:47:03 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , eric.dumazet@gmail.com, Mat Martineau Subject: [PATCH net-next v2 12/17] mptcp: implement and use MPTCP-level retransmission Date: Thu, 26 Mar 2020 13:46:35 -0700 Message-Id: <20200326204640.67336-13-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> References: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Paolo Abeni On timeout event, schedule a work queue to do the retransmission. Retransmission code closely resembles the sendmsg() implementation and re-uses mptcp_sendmsg_frag, providing a dummy msghdr - for flags' sake - and peeking the relevant dfrag from the rtx head. Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/protocol.c | 98 ++++++++++++++++++++++++++++++++++++++++++-- net/mptcp/protocol.h | 1 + 2 files changed, 95 insertions(+), 4 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 7bdf79f8787a..e72c7322637f 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -283,6 +283,10 @@ static void mptcp_reset_timer(struct sock *sk) void mptcp_data_acked(struct sock *sk) { mptcp_reset_timer(sk); + + if (!sk_stream_is_writeable(sk) && + schedule_work(&mptcp_sk(sk)->work)) + sock_hold(sk); } static void mptcp_stop_timer(struct sock *sk) @@ -900,10 +904,13 @@ static void mptcp_retransmit_handler(struct sock *sk) { struct mptcp_sock *msk = mptcp_sk(sk); - if (atomic64_read(&msk->snd_una) == msk->write_seq) + if (atomic64_read(&msk->snd_una) == msk->write_seq) { mptcp_stop_timer(sk); - else - mptcp_reset_timer(sk); + } else { + set_bit(MPTCP_WORK_RTX, &msk->flags); + if (schedule_work(&msk->work)) + sock_hold(sk); + } } static void mptcp_retransmit_timer(struct timer_list *t) @@ -925,6 +932,37 @@ static void mptcp_retransmit_timer(struct timer_list *t) sock_put(sk); } +/* Find an idle subflow. Return NULL if there is unacked data at tcp + * level. + * + * A backup subflow is returned only if that is the only kind available. + */ +static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + struct sock *backup = NULL; + + sock_owned_by_me((const struct sock *)msk); + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + /* still data outstanding at TCP level? Don't retransmit. */ + if (!tcp_write_queue_empty(ssk)) + return NULL; + + if (subflow->backup) { + if (!backup) + backup = ssk; + continue; + } + + return ssk; + } + + return backup; +} + /* subflow sockets can be either outgoing (connect) or incoming * (accept). * @@ -958,11 +996,62 @@ static unsigned int mptcp_sync_mss(struct sock *sk, u32 pmtu) static void mptcp_worker(struct work_struct *work) { struct mptcp_sock *msk = container_of(work, struct mptcp_sock, work); - struct sock *sk = &msk->sk.icsk_inet.sk; + struct sock *ssk, *sk = &msk->sk.icsk_inet.sk; + int orig_len, orig_offset, ret, mss_now = 0, size_goal = 0; + struct mptcp_data_frag *dfrag; + u64 orig_write_seq; + size_t copied = 0; + struct msghdr msg; + long timeo = 0; lock_sock(sk); + mptcp_clean_una(sk); __mptcp_flush_join_list(msk); __mptcp_move_skbs(msk); + + if (!test_and_clear_bit(MPTCP_WORK_RTX, &msk->flags)) + goto unlock; + + dfrag = mptcp_rtx_head(sk); + if (!dfrag) + goto unlock; + + ssk = mptcp_subflow_get_retrans(msk); + if (!ssk) + goto reset_unlock; + + lock_sock(ssk); + + msg.msg_flags = MSG_DONTWAIT; + orig_len = dfrag->data_len; + orig_offset = dfrag->offset; + orig_write_seq = dfrag->data_seq; + while (dfrag->data_len > 0) { + ret = mptcp_sendmsg_frag(sk, ssk, &msg, dfrag, &timeo, &mss_now, + &size_goal); + if (ret < 0) + break; + + copied += ret; + dfrag->data_len -= ret; + dfrag->offset += ret; + } + if (copied) + tcp_push(ssk, msg.msg_flags, mss_now, tcp_sk(ssk)->nonagle, + size_goal); + + dfrag->data_seq = orig_write_seq; + dfrag->offset = orig_offset; + dfrag->data_len = orig_len; + + mptcp_set_timeout(sk, ssk); + release_sock(ssk); + +reset_unlock: + if (!mptcp_timer_pending(sk)) + mptcp_reset_timer(sk); + +unlock: release_sock(sk); sock_put(sk); } @@ -1124,6 +1213,7 @@ static int mptcp_disconnect(struct sock *sk, int flags) lock_sock(sk); __mptcp_clear_xmit(sk); release_sock(sk); + mptcp_cancel_work(sk); return tcp_disconnect(sk, flags); } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index f855c954a8ff..e9d4a852c7f1 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -88,6 +88,7 @@ /* MPTCP socket flags */ #define MPTCP_DATA_READY 0 #define MPTCP_SEND_SPACE 1 +#define MPTCP_WORK_RTX 2 static inline __be32 mptcp_option(u8 subopt, u8 len, u8 nib, u8 field) { From patchwork Thu Mar 26 20:46:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 221772 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC7C2C2D0E5 for ; Thu, 26 Mar 2020 20:47:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 924A020722 for ; Thu, 26 Mar 2020 20:47:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727841AbgCZUrO (ORCPT ); Thu, 26 Mar 2020 16:47:14 -0400 Received: from mga05.intel.com ([192.55.52.43]:47909 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727791AbgCZUrE (ORCPT ); Thu, 26 Mar 2020 16:47:04 -0400 IronPort-SDR: 8X8J2jHfsGYvtj9GVbe8GB+jcRiJJbgVJyyo4LRwdroPQwDbMsmAb6ZShH6d4PIowBNA4qHIeG e+MVkVmlSuvQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Mar 2020 13:47:03 -0700 IronPort-SDR: u4xwnR4BCNX5BHh7JMAnGzro1x/J6bPCNDiO8XGf1Odu3aKIqXyNgCqBYozCRlTi2FBmz/D7Zn R3RrsZsgQ5Hg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,309,1580803200"; d="scan'208";a="238911698" Received: from mjmartin-nuc02.mjmartin-nuc02 (HELO mjmartin-nuc02.sea.intel.com) ([10.252.133.119]) by fmsmga007.fm.intel.com with ESMTP; 26 Mar 2020 13:47:03 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Florian Westphal , eric.dumazet@gmail.com, Paolo Abeni , Mat Martineau Subject: [PATCH net-next v2 14/17] mptcp: add and use MIB counter infrastructure Date: Thu, 26 Mar 2020 13:46:37 -0700 Message-Id: <20200326204640.67336-15-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> References: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Florian Westphal Exported via same /proc file as the Linux TCP MIB counters, so "netstat -s" or "nstat" will show them automatically. The MPTCP MIB counters are allocated in a distinct pcpu area in order to avoid bloating/wasting TCP pcpu memory. Counters are allocated once the first MPTCP socket is created in a network namespace and free'd on exit. If no sockets have been allocated, all-zero mptcp counters are shown. The MIB counter list is taken from the multipath-tcp.org kernel, but only a few counters have been picked up so far. The counter list can be increased at any time later on. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Florian Westphal Signed-off-by: Mat Martineau --- include/net/mptcp.h | 4 +++ include/net/netns/mib.h | 3 ++ net/ipv4/af_inet.c | 4 +++ net/ipv4/proc.c | 2 ++ net/mptcp/Makefile | 2 +- net/mptcp/mib.c | 69 +++++++++++++++++++++++++++++++++++++++++ net/mptcp/mib.h | 40 ++++++++++++++++++++++++ net/mptcp/protocol.c | 30 +++++++++++++----- net/mptcp/subflow.c | 33 ++++++++++++++++---- 9 files changed, 172 insertions(+), 15 deletions(-) create mode 100644 net/mptcp/mib.c create mode 100644 net/mptcp/mib.h diff --git a/include/net/mptcp.h b/include/net/mptcp.h index b648fa20eec8..0e7c5471010b 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -12,6 +12,8 @@ #include #include +struct seq_file; + /* MPTCP sk_buff extension data */ struct mptcp_ext { u64 data_ack; @@ -123,6 +125,7 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to, bool mptcp_sk_is_subflow(const struct sock *sk); +void mptcp_seq_show(struct seq_file *seq); #else static inline void mptcp_init(void) @@ -194,6 +197,7 @@ static inline bool mptcp_sk_is_subflow(const struct sock *sk) return false; } +static inline void mptcp_seq_show(struct seq_file *seq) { } #endif /* CONFIG_MPTCP */ #if IS_ENABLED(CONFIG_MPTCP_IPV6) diff --git a/include/net/netns/mib.h b/include/net/netns/mib.h index b5fdb108d602..59b2c3a3db42 100644 --- a/include/net/netns/mib.h +++ b/include/net/netns/mib.h @@ -27,6 +27,9 @@ struct netns_mib { #if IS_ENABLED(CONFIG_TLS) DEFINE_SNMP_STAT(struct linux_tls_mib, tls_statistics); #endif +#ifdef CONFIG_MPTCP + DEFINE_SNMP_STAT(struct mptcp_mib, mptcp_statistics); +#endif }; #endif diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index bd7b4e92e07f..cf58e29cf746 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1793,6 +1793,10 @@ static __net_exit void ipv4_mib_exit_net(struct net *net) free_percpu(net->mib.net_statistics); free_percpu(net->mib.ip_statistics); free_percpu(net->mib.tcp_statistics); +#ifdef CONFIG_MPTCP + /* allocated on demand, see mptcp_init_sock() */ + free_percpu(net->mib.mptcp_statistics); +#endif } static __net_initdata struct pernet_operations ipv4_mib_ops = { diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 2580303249e2..75545a829a2b 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include #include @@ -485,6 +486,7 @@ static int netstat_seq_show(struct seq_file *seq, void *v) offsetof(struct ipstats_mib, syncp))); seq_putc(seq, '\n'); + mptcp_seq_show(seq); return 0; } diff --git a/net/mptcp/Makefile b/net/mptcp/Makefile index 54494cf5bec0..faebe8ec9f73 100644 --- a/net/mptcp/Makefile +++ b/net/mptcp/Makefile @@ -1,4 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_MPTCP) += mptcp.o -mptcp-y := protocol.o subflow.o options.o token.o crypto.o ctrl.o pm.o diag.o +mptcp-y := protocol.o subflow.o options.o token.o crypto.o ctrl.o pm.o diag.o mib.o diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c new file mode 100644 index 000000000000..0a6a15f3456d --- /dev/null +++ b/net/mptcp/mib.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include +#include +#include +#include +#include + +#include "mib.h" + +static const struct snmp_mib mptcp_snmp_list[] = { + SNMP_MIB_ITEM("MPCapableSYNRX", MPTCP_MIB_MPCAPABLEPASSIVE), + SNMP_MIB_ITEM("MPCapableACKRX", MPTCP_MIB_MPCAPABLEPASSIVEACK), + SNMP_MIB_ITEM("MPCapableFallbackACK", MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK), + SNMP_MIB_ITEM("MPCapableFallbackSYNACK", MPTCP_MIB_MPCAPABLEACTIVEFALLBACK), + SNMP_MIB_ITEM("MPTCPRetrans", MPTCP_MIB_RETRANSSEGS), + SNMP_MIB_ITEM("MPJoinNoTokenFound", MPTCP_MIB_JOINNOTOKEN), + SNMP_MIB_ITEM("MPJoinSynRx", MPTCP_MIB_JOINSYNRX), + SNMP_MIB_ITEM("MPJoinSynAckRx", MPTCP_MIB_JOINSYNACKRX), + SNMP_MIB_ITEM("MPJoinSynAckHMacFailure", MPTCP_MIB_JOINSYNACKMAC), + SNMP_MIB_ITEM("MPJoinAckRx", MPTCP_MIB_JOINACKRX), + SNMP_MIB_ITEM("MPJoinAckHMacFailure", MPTCP_MIB_JOINACKMAC), + SNMP_MIB_ITEM("DSSNotMatching", MPTCP_MIB_DSSNOMATCH), + SNMP_MIB_ITEM("InfiniteMapRx", MPTCP_MIB_INFINITEMAPRX), + SNMP_MIB_SENTINEL +}; + +/* mptcp_mib_alloc - allocate percpu mib counters + * + * These are allocated when the first mptcp socket is created so + * we do not waste percpu memory if mptcp isn't in use. + */ +bool mptcp_mib_alloc(struct net *net) +{ + struct mptcp_mib __percpu *mib = alloc_percpu(struct mptcp_mib); + + if (!mib) + return false; + + if (cmpxchg(&net->mib.mptcp_statistics, NULL, mib)) + free_percpu(mib); + + return true; +} + +void mptcp_seq_show(struct seq_file *seq) +{ + struct net *net = seq->private; + int i; + + seq_puts(seq, "MPTcpExt:"); + for (i = 0; mptcp_snmp_list[i].name; i++) + seq_printf(seq, " %s", mptcp_snmp_list[i].name); + + seq_puts(seq, "\nMPTcpExt:"); + + if (!net->mib.mptcp_statistics) { + for (i = 0; mptcp_snmp_list[i].name; i++) + seq_puts(seq, " 0"); + + return; + } + + for (i = 0; mptcp_snmp_list[i].name; i++) + seq_printf(seq, " %lu", + snmp_fold_field(net->mib.mptcp_statistics, + mptcp_snmp_list[i].entry)); + seq_putc(seq, '\n'); +} diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h new file mode 100644 index 000000000000..d7de340fc997 --- /dev/null +++ b/net/mptcp/mib.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +enum linux_mptcp_mib_field { + MPTCP_MIB_NUM = 0, + MPTCP_MIB_MPCAPABLEPASSIVE, /* Received SYN with MP_CAPABLE */ + MPTCP_MIB_MPCAPABLEPASSIVEACK, /* Received third ACK with MP_CAPABLE */ + MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK,/* Server-side fallback during 3-way handshake */ + MPTCP_MIB_MPCAPABLEACTIVEFALLBACK, /* Client-side fallback during 3-way handshake */ + MPTCP_MIB_RETRANSSEGS, /* Segments retransmitted at the MPTCP-level */ + MPTCP_MIB_JOINNOTOKEN, /* Received MP_JOIN but the token was not found */ + MPTCP_MIB_JOINSYNRX, /* Received a SYN + MP_JOIN */ + MPTCP_MIB_JOINSYNACKRX, /* Received a SYN/ACK + MP_JOIN */ + MPTCP_MIB_JOINSYNACKMAC, /* HMAC was wrong on SYN/ACK + MP_JOIN */ + MPTCP_MIB_JOINACKRX, /* Received an ACK + MP_JOIN */ + MPTCP_MIB_JOINACKMAC, /* HMAC was wrong on ACK + MP_JOIN */ + MPTCP_MIB_DSSNOMATCH, /* Received a new mapping that did not match the previous one */ + MPTCP_MIB_INFINITEMAPRX, /* Received an infinite mapping */ + __MPTCP_MIB_MAX +}; + +#define LINUX_MIB_MPTCP_MAX __MPTCP_MIB_MAX +struct mptcp_mib { + unsigned long mibs[LINUX_MIB_MPTCP_MAX]; +}; + +static inline void MPTCP_INC_STATS(struct net *net, + enum linux_mptcp_mib_field field) +{ + if (likely(net->mib.mptcp_statistics)) + SNMP_INC_STATS(net->mib.mptcp_statistics, field); +} + +static inline void __MPTCP_INC_STATS(struct net *net, + enum linux_mptcp_mib_field field) +{ + if (likely(net->mib.mptcp_statistics)) + __SNMP_INC_STATS(net->mib.mptcp_statistics, field); +} + +bool mptcp_mib_alloc(struct net *net); diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e72c7322637f..9d22dcea3a05 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -21,6 +21,7 @@ #endif #include #include "protocol.h" +#include "mib.h" #define MPTCP_SAME_STATE TCP_MAX_STATES @@ -1032,6 +1033,7 @@ static void mptcp_worker(struct work_struct *work) if (ret < 0) break; + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RETRANSSEGS); copied += ret; dfrag->data_len -= ret; dfrag->offset += ret; @@ -1081,17 +1083,22 @@ static int __mptcp_init_sock(struct sock *sk) static int mptcp_init_sock(struct sock *sk) { - int ret = __mptcp_init_sock(sk); + struct net *net = sock_net(sk); + int ret; + if (!mptcp_is_enabled(net)) + return -ENOPROTOOPT; + + if (unlikely(!net->mib.mptcp_statistics) && !mptcp_mib_alloc(net)) + return -ENOMEM; + + ret = __mptcp_init_sock(sk); if (ret) return ret; sk_sockets_allocated_inc(sk); sk->sk_sndbuf = sock_net(sk)->ipv4.sysctl_tcp_wmem[2]; - if (!mptcp_is_enabled(sock_net(sk))) - return -ENOPROTOOPT; - return 0; } @@ -1327,7 +1334,12 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, list_add(&subflow->node, &msk->conn_list); bh_unlock_sock(new_mptcp_sock); + + __MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEPASSIVEACK); local_bh_enable(); + } else { + MPTCP_INC_STATS(sock_net(sk), + MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); } return newsk; @@ -1448,13 +1460,15 @@ void mptcp_finish_connect(struct sock *ssk) u64 ack_seq; subflow = mptcp_subflow_ctx(ssk); - - if (!subflow->mp_capable) - return; - sk = subflow->conn; msk = mptcp_sk(sk); + if (!subflow->mp_capable) { + MPTCP_INC_STATS(sock_net(sk), + MPTCP_MIB_MPCAPABLEACTIVEFALLBACK); + return; + } + pr_debug("msk=%p, token=%u", sk, subflow->token); mptcp_crypto_key_sha(subflow->remote_key, NULL, &ack_seq); diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index c051db074708..edf5b543bdf5 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -20,6 +20,13 @@ #endif #include #include "protocol.h" +#include "mib.h" + +static inline void SUBFLOW_REQ_INC_STATS(struct request_sock *req, + enum linux_mptcp_mib_field field) +{ + MPTCP_INC_STATS(sock_net(req_to_sk(req)), field); +} static int subflow_rebuild_header(struct sock *sk) { @@ -88,8 +95,7 @@ static bool subflow_token_join_request(struct request_sock *req, msk = mptcp_token_get_sock(subflow_req->token); if (!msk) { - pr_debug("subflow_req=%p, token=%u - not found\n", - subflow_req, subflow_req->token); + SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINNOTOKEN); return false; } @@ -137,8 +143,14 @@ static void subflow_init_req(struct request_sock *req, return; #endif - if (rx_opt.mptcp.mp_capable && rx_opt.mptcp.mp_join) - return; + if (rx_opt.mptcp.mp_capable) { + SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MPCAPABLEPASSIVE); + + if (rx_opt.mptcp.mp_join) + return; + } else if (rx_opt.mptcp.mp_join) { + SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINSYNRX); + } if (rx_opt.mptcp.mp_capable && listener->request_mptcp) { int err; @@ -237,6 +249,7 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb) subflow, subflow->thmac, subflow->remote_nonce); if (!subflow_thmac_valid(subflow)) { + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_JOINACKMAC); subflow->mp_join = 0; goto do_reset; } @@ -253,6 +266,7 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb) goto do_reset; subflow->conn_finished = 1; + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_JOINSYNACKRX); } else { do_reset: tcp_send_active_reset(sk, GFP_ATOMIC); @@ -382,8 +396,10 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, opt_rx.mptcp.mp_join = 0; mptcp_get_options(skb, &opt_rx); if (!opt_rx.mptcp.mp_join || - !subflow_hmac_valid(req, &opt_rx)) + !subflow_hmac_valid(req, &opt_rx)) { + SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC); return NULL; + } } create_child: @@ -420,6 +436,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, ctx->conn = (struct sock *)owner; if (!mptcp_finish_join(child)) goto close_child; + + SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKRX); } } @@ -535,6 +553,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk) data_len = mpext->data_len; if (data_len == 0) { pr_err("Infinite mapping not handled"); + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_INFINITEMAPRX); return MAPPING_INVALID; } @@ -578,8 +597,10 @@ static enum mapping_status get_mapping_status(struct sock *ssk) /* If this skb data are fully covered by the current mapping, * the new map would need caching, which is not supported */ - if (skb_is_fully_mapped(ssk, skb)) + if (skb_is_fully_mapped(ssk, skb)) { + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DSSNOMATCH); return MAPPING_INVALID; + } /* will validate the next map after consuming the current one */ return MAPPING_OK; From patchwork Thu Mar 26 20:46:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 221774 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C143C2D0ED for ; Thu, 26 Mar 2020 20:47:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 562272070A for ; Thu, 26 Mar 2020 20:47:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727826AbgCZUrG (ORCPT ); Thu, 26 Mar 2020 16:47:06 -0400 Received: from mga05.intel.com ([192.55.52.43]:47909 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727800AbgCZUrF (ORCPT ); Thu, 26 Mar 2020 16:47:05 -0400 IronPort-SDR: 7gXKMJce+kSVqXFN24sBCdsH6sFmpaYHdGz+N3aoExHLDxgJGtm22HmVVirJWQfJZ9C4mHL+rJ oMcupfARlxSw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Mar 2020 13:47:04 -0700 IronPort-SDR: 6kKerVMdO6B6YdRHChvN1Z0UltPxTJs2GqfDHKisqtCV36uZcCVT7B8G17KKNN9KW1x68oC6As 7D4YC9Oxadig== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,309,1580803200"; d="scan'208";a="238911703" Received: from mjmartin-nuc02.mjmartin-nuc02 (HELO mjmartin-nuc02.sea.intel.com) ([10.252.133.119]) by fmsmga007.fm.intel.com with ESMTP; 26 Mar 2020 13:47:04 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , eric.dumazet@gmail.com, Mat Martineau Subject: [PATCH net-next v2 16/17] selftests: add PM netlink functional tests Date: Thu, 26 Mar 2020 13:46:39 -0700 Message-Id: <20200326204640.67336-17-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> References: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Paolo Abeni This introduces basic self-tests for the PM netlink, checking the basic APIs and possible exceptional values. Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- tools/testing/selftests/net/mptcp/.gitignore | 1 + tools/testing/selftests/net/mptcp/Makefile | 7 +- .../testing/selftests/net/mptcp/pm_netlink.sh | 130 ++++ tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 616 ++++++++++++++++++ 4 files changed, 751 insertions(+), 3 deletions(-) create mode 100755 tools/testing/selftests/net/mptcp/pm_netlink.sh create mode 100644 tools/testing/selftests/net/mptcp/pm_nl_ctl.c diff --git a/tools/testing/selftests/net/mptcp/.gitignore b/tools/testing/selftests/net/mptcp/.gitignore index d72f07642738..ea13b255a99d 100644 --- a/tools/testing/selftests/net/mptcp/.gitignore +++ b/tools/testing/selftests/net/mptcp/.gitignore @@ -1,2 +1,3 @@ mptcp_connect +pm_nl_ctl *.pcap diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile index ba450e62dc5b..70c831fcaf70 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -1,12 +1,13 @@ # SPDX-License-Identifier: GPL-2.0 top_srcdir = ../../../../.. +KSFT_KHDR_INSTALL := 1 -CFLAGS = -Wall -Wl,--no-as-needed -O2 -g +CFLAGS = -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include -TEST_PROGS := mptcp_connect.sh +TEST_PROGS := mptcp_connect.sh pm_netlink.sh -TEST_GEN_FILES = mptcp_connect +TEST_GEN_FILES = mptcp_connect pm_nl_ctl TEST_FILES := settings diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh new file mode 100755 index 000000000000..cfc743c47cb2 --- /dev/null +++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh @@ -0,0 +1,130 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +ksft_skip=4 +ret=0 + +usage() { + echo "Usage: $0 [ -h ]" +} + + +while getopts "$optstring" option;do + case "$option" in + "h") + usage $0 + exit 0 + ;; + "?") + usage $0 + exit 1 + ;; + esac +done + +sec=$(date +%s) +rndh=$(printf %x $sec)-$(mktemp -u XXXXXX) +ns1="ns1-$rndh" +err=$(mktemp) +ret=0 + +cleanup() +{ + rm -f $out + ip netns del $ns1 +} + +ip -Version > /dev/null 2>&1 +if [ $? -ne 0 ];then + echo "SKIP: Could not run test without ip tool" + exit $ksft_skip +fi + +trap cleanup EXIT + +ip netns add $ns1 || exit $ksft_skip +ip -net $ns1 link set lo up +ip netns exec $ns1 sysctl -q net.mptcp.enabled=1 + +check() +{ + local cmd="$1" + local expected="$2" + local msg="$3" + local out=`$cmd 2>$err` + local cmd_ret=$? + + printf "%-50s %s" "$msg" + if [ $cmd_ret -ne 0 ]; then + echo "[FAIL] command execution '$cmd' stderr " + cat $err + ret=1 + elif [ "$out" = "$expected" ]; then + echo "[ OK ]" + else + echo -n "[FAIL] " + echo "expected '$expected' got '$out'" + ret=1 + fi +} + +check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "defaults addr list" +check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0 +subflows 0" "defaults limits" + +ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.1 +ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.2 flags subflow dev lo +ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.3 flags signal,backup +check "ip netns exec $ns1 ./pm_nl_ctl get 1" "id 1 flags 10.0.1.1 " "simple add/get addr" + +check "ip netns exec $ns1 ./pm_nl_ctl dump" \ +"id 1 flags 10.0.1.1 +id 2 flags subflow dev lo 10.0.1.2 +id 3 flags signal,backup 10.0.1.3 " "dump addrs" + +ip netns exec $ns1 ./pm_nl_ctl del 2 +check "ip netns exec $ns1 ./pm_nl_ctl get 2" "" "simple del addr" +check "ip netns exec $ns1 ./pm_nl_ctl dump" \ +"id 1 flags 10.0.1.1 +id 3 flags signal,backup 10.0.1.3 " "dump addrs after del" + +ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.3 +check "ip netns exec $ns1 ./pm_nl_ctl get 4" "" "duplicate addr" + +ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.4 id 10 flags signal +check "ip netns exec $ns1 ./pm_nl_ctl get 4" "id 4 flags signal 10.0.1.4 " "id addr increment" + +for i in `seq 5 9`; do + ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.$i flags signal >/dev/null 2>&1 +done +check "ip netns exec $ns1 ./pm_nl_ctl get 9" "id 9 flags signal 10.0.1.9 " "hard addr limit" +check "ip netns exec $ns1 ./pm_nl_ctl get 10" "" "above hard addr limit" + +for i in `seq 9 256`; do + ip netns exec $ns1 ./pm_nl_ctl del $i + ip netns exec $ns1 ./pm_nl_ctl add 10.0.0.9 +done +check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags 10.0.1.1 +id 3 flags signal,backup 10.0.1.3 +id 4 flags signal 10.0.1.4 +id 5 flags signal 10.0.1.5 +id 6 flags signal 10.0.1.6 +id 7 flags signal 10.0.1.7 +id 8 flags signal 10.0.1.8 " "id limit" + +ip netns exec $ns1 ./pm_nl_ctl flush +check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "flush addrs" + +ip netns exec $ns1 ./pm_nl_ctl limits 9 1 +check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0 +subflows 0" "rcv addrs above hard limit" + +ip netns exec $ns1 ./pm_nl_ctl limits 1 9 +check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0 +subflows 0" "subflows above hard limit" + +ip netns exec $ns1 ./pm_nl_ctl limits 8 8 +check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 8 +subflows 8" "set limits" + +exit $ret diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c new file mode 100644 index 000000000000..de9209305026 --- /dev/null +++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c @@ -0,0 +1,616 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include + +#include +#include + +#include +#include + +#include +#include + +#include "linux/mptcp.h" + +#ifndef MPTCP_PM_NAME +#define MPTCP_PM_NAME "mptcp_pm" +#endif + +static void syntax(char *argv[]) +{ + fprintf(stderr, "%s add|get|del|flush|dump|accept []\n", argv[0]); + fprintf(stderr, "\tadd [flags signal|subflow|backup] [id ] [dev ] \n"); + fprintf(stderr, "\tdel \n"); + fprintf(stderr, "\tget \n"); + fprintf(stderr, "\tflush\n"); + fprintf(stderr, "\tdump\n"); + fprintf(stderr, "\tlimits [ ]\n"); + exit(0); +} + +static int init_genl_req(char *data, int family, int cmd, int version) +{ + struct nlmsghdr *nh = (void *)data; + struct genlmsghdr *gh; + int off = 0; + + nh->nlmsg_type = family; + nh->nlmsg_flags = NLM_F_REQUEST; + nh->nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN); + off += NLMSG_ALIGN(sizeof(*nh)); + + gh = (void *)(data + off); + gh->cmd = cmd; + gh->version = version; + off += NLMSG_ALIGN(sizeof(*gh)); + return off; +} + +static void nl_error(struct nlmsghdr *nh) +{ + struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(nh); + int len = nh->nlmsg_len - sizeof(*nh); + uint32_t off; + + if (len < sizeof(struct nlmsgerr)) + error(1, 0, "netlink error message truncated %d min %ld", len, + sizeof(struct nlmsgerr)); + + if (!err->error) { + /* check messages from kernel */ + struct rtattr *attrs = (struct rtattr *)NLMSG_DATA(nh); + + while (RTA_OK(attrs, len)) { + if (attrs->rta_type == NLMSGERR_ATTR_MSG) + fprintf(stderr, "netlink ext ack msg: %s\n", + (char *)RTA_DATA(attrs)); + if (attrs->rta_type == NLMSGERR_ATTR_OFFS) { + memcpy(&off, RTA_DATA(attrs), 4); + fprintf(stderr, "netlink err off %d\n", + (int)off); + } + attrs = RTA_NEXT(attrs, len); + } + } else { + fprintf(stderr, "netlink error %d", err->error); + } +} + +/* do a netlink command and, if max > 0, fetch the reply */ +static int do_nl_req(int fd, struct nlmsghdr *nh, int len, int max) +{ + struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; + socklen_t addr_len; + void *data = nh; + int rem, ret; + int err = 0; + + nh->nlmsg_len = len; + ret = sendto(fd, data, len, 0, (void *)&nladdr, sizeof(nladdr)); + if (ret != len) + error(1, errno, "send netlink: %uB != %uB\n", ret, len); + if (max == 0) + return 0; + + addr_len = sizeof(nladdr); + rem = ret = recvfrom(fd, data, max, 0, (void *)&nladdr, &addr_len); + if (ret < 0) + error(1, errno, "recv netlink: %uB\n", ret); + + /* Beware: the NLMSG_NEXT macro updates the 'rem' argument */ + for (; NLMSG_OK(nh, rem); nh = NLMSG_NEXT(nh, rem)) { + if (nh->nlmsg_type == NLMSG_ERROR) { + nl_error(nh); + err = 1; + } + } + if (err) + error(1, 0, "bailing out due to netlink error[s]"); + return ret; +} + +static int genl_parse_getfamily(struct nlmsghdr *nlh) +{ + struct genlmsghdr *ghdr = NLMSG_DATA(nlh); + int len = nlh->nlmsg_len; + struct rtattr *attrs; + + if (nlh->nlmsg_type != GENL_ID_CTRL) + error(1, errno, "Not a controller message, len=%d type=0x%x\n", + nlh->nlmsg_len, nlh->nlmsg_type); + + len -= NLMSG_LENGTH(GENL_HDRLEN); + + if (len < 0) + error(1, errno, "wrong controller message len %d\n", len); + + if (ghdr->cmd != CTRL_CMD_NEWFAMILY) + error(1, errno, "Unknown controller command %d\n", ghdr->cmd); + + attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN); + while (RTA_OK(attrs, len)) { + if (attrs->rta_type == CTRL_ATTR_FAMILY_ID) + return *(__u16 *)RTA_DATA(attrs); + attrs = RTA_NEXT(attrs, len); + } + + error(1, errno, "can't find CTRL_ATTR_FAMILY_ID attr"); + return -1; +} + +static int resolve_mptcp_pm_netlink(int fd) +{ + char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) + + NLMSG_ALIGN(sizeof(struct genlmsghdr)) + + 1024]; + struct nlmsghdr *nh; + struct rtattr *rta; + int namelen; + int off = 0; + + memset(data, 0, sizeof(data)); + nh = (void *)data; + off = init_genl_req(data, GENL_ID_CTRL, CTRL_CMD_GETFAMILY, 0); + + rta = (void *)(data + off); + namelen = strlen(MPTCP_PM_NAME) + 1; + rta->rta_type = CTRL_ATTR_FAMILY_NAME; + rta->rta_len = RTA_LENGTH(namelen); + memcpy(RTA_DATA(rta), MPTCP_PM_NAME, namelen); + off += NLMSG_ALIGN(rta->rta_len); + + do_nl_req(fd, nh, off, sizeof(data)); + return genl_parse_getfamily((void *)data); +} + +int add_addr(int fd, int pm_family, int argc, char *argv[]) +{ + char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) + + NLMSG_ALIGN(sizeof(struct genlmsghdr)) + + 1024]; + struct rtattr *rta, *nest; + struct nlmsghdr *nh; + u_int16_t family; + u_int32_t flags; + int nest_start; + u_int8_t id; + int off = 0; + int arg; + + memset(data, 0, sizeof(data)); + nh = (void *)data; + off = init_genl_req(data, pm_family, MPTCP_PM_CMD_ADD_ADDR, + MPTCP_PM_VER); + + if (argc < 3) + syntax(argv); + + nest_start = off; + nest = (void *)(data + off); + nest->rta_type = NLA_F_NESTED | MPTCP_PM_ATTR_ADDR; + nest->rta_len = RTA_LENGTH(0); + off += NLMSG_ALIGN(nest->rta_len); + + /* addr data */ + rta = (void *)(data + off); + if (inet_pton(AF_INET, argv[2], RTA_DATA(rta))) { + family = AF_INET; + rta->rta_type = MPTCP_PM_ADDR_ATTR_ADDR4; + rta->rta_len = RTA_LENGTH(4); + } else if (inet_pton(AF_INET6, argv[2], RTA_DATA(rta))) { + family = AF_INET6; + rta->rta_type = MPTCP_PM_ADDR_ATTR_ADDR6; + rta->rta_len = RTA_LENGTH(16); + } else + error(1, errno, "can't parse ip %s", argv[2]); + off += NLMSG_ALIGN(rta->rta_len); + + /* family */ + rta = (void *)(data + off); + rta->rta_type = MPTCP_PM_ADDR_ATTR_FAMILY; + rta->rta_len = RTA_LENGTH(2); + memcpy(RTA_DATA(rta), &family, 2); + off += NLMSG_ALIGN(rta->rta_len); + + for (arg = 3; arg < argc; arg++) { + if (!strcmp(argv[arg], "flags")) { + char *tok, *str; + + /* flags */ + flags = 0; + if (++arg >= argc) + error(1, 0, " missing flags value"); + + /* do not support flag list yet */ + for (str = argv[arg]; (tok = strtok(str, ",")); + str = NULL) { + if (!strcmp(tok, "subflow")) + flags |= MPTCP_PM_ADDR_FLAG_SUBFLOW; + else if (!strcmp(tok, "signal")) + flags |= MPTCP_PM_ADDR_FLAG_SIGNAL; + else if (!strcmp(tok, "backup")) + flags |= MPTCP_PM_ADDR_FLAG_BACKUP; + else + error(1, errno, + "unknown flag %s", argv[arg]); + } + + rta = (void *)(data + off); + rta->rta_type = MPTCP_PM_ADDR_ATTR_FLAGS; + rta->rta_len = RTA_LENGTH(4); + memcpy(RTA_DATA(rta), &flags, 4); + off += NLMSG_ALIGN(rta->rta_len); + } else if (!strcmp(argv[arg], "id")) { + if (++arg >= argc) + error(1, 0, " missing id value"); + + id = atoi(argv[arg]); + rta = (void *)(data + off); + rta->rta_type = MPTCP_PM_ADDR_ATTR_ID; + rta->rta_len = RTA_LENGTH(1); + memcpy(RTA_DATA(rta), &id, 1); + off += NLMSG_ALIGN(rta->rta_len); + } else if (!strcmp(argv[arg], "dev")) { + int32_t ifindex; + + if (++arg >= argc) + error(1, 0, " missing dev name"); + + ifindex = if_nametoindex(argv[arg]); + if (!ifindex) + error(1, errno, "unknown device %s", argv[arg]); + + rta = (void *)(data + off); + rta->rta_type = MPTCP_PM_ADDR_ATTR_IF_IDX; + rta->rta_len = RTA_LENGTH(4); + memcpy(RTA_DATA(rta), &ifindex, 4); + off += NLMSG_ALIGN(rta->rta_len); + } else + error(1, 0, "unknown keyword %s", argv[arg]); + } + nest->rta_len = off - nest_start; + + do_nl_req(fd, nh, off, 0); + return 0; +} + +int del_addr(int fd, int pm_family, int argc, char *argv[]) +{ + char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) + + NLMSG_ALIGN(sizeof(struct genlmsghdr)) + + 1024]; + struct rtattr *rta, *nest; + struct nlmsghdr *nh; + int nest_start; + u_int8_t id; + int off = 0; + + memset(data, 0, sizeof(data)); + nh = (void *)data; + off = init_genl_req(data, pm_family, MPTCP_PM_CMD_DEL_ADDR, + MPTCP_PM_VER); + + /* the only argument is the address id */ + if (argc != 3) + syntax(argv); + + id = atoi(argv[2]); + + nest_start = off; + nest = (void *)(data + off); + nest->rta_type = NLA_F_NESTED | MPTCP_PM_ATTR_ADDR; + nest->rta_len = RTA_LENGTH(0); + off += NLMSG_ALIGN(nest->rta_len); + + /* build a dummy addr with only the ID set */ + rta = (void *)(data + off); + rta->rta_type = MPTCP_PM_ADDR_ATTR_ID; + rta->rta_len = RTA_LENGTH(1); + memcpy(RTA_DATA(rta), &id, 1); + off += NLMSG_ALIGN(rta->rta_len); + nest->rta_len = off - nest_start; + + do_nl_req(fd, nh, off, 0); + return 0; +} + +static void print_addr(struct rtattr *attrs, int len) +{ + uint16_t family = 0; + char str[1024]; + uint32_t flags; + uint8_t id; + + while (RTA_OK(attrs, len)) { + if (attrs->rta_type == MPTCP_PM_ADDR_ATTR_FAMILY) + memcpy(&family, RTA_DATA(attrs), 2); + if (attrs->rta_type == MPTCP_PM_ADDR_ATTR_ADDR4) { + if (family != AF_INET) + error(1, errno, "wrong IP (v4) for family %d", + family); + inet_ntop(AF_INET, RTA_DATA(attrs), str, sizeof(str)); + printf("%s ", str); + } + if (attrs->rta_type == MPTCP_PM_ADDR_ATTR_ADDR6) { + if (family != AF_INET6) + error(1, errno, "wrong IP (v6) for family %d", + family); + inet_ntop(AF_INET6, RTA_DATA(attrs), str, sizeof(str)); + printf("%s ", str); + } + if (attrs->rta_type == MPTCP_PM_ADDR_ATTR_ID) { + memcpy(&id, RTA_DATA(attrs), 1); + printf("id %d ", id); + } + if (attrs->rta_type == MPTCP_PM_ADDR_ATTR_FLAGS) { + memcpy(&flags, RTA_DATA(attrs), 4); + + printf("flags "); + if (flags & MPTCP_PM_ADDR_FLAG_SIGNAL) { + printf("signal"); + flags &= ~MPTCP_PM_ADDR_FLAG_SIGNAL; + if (flags) + printf(","); + } + + if (flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { + printf("subflow"); + flags &= ~MPTCP_PM_ADDR_FLAG_SUBFLOW; + if (flags) + printf(","); + } + + if (flags & MPTCP_PM_ADDR_FLAG_BACKUP) { + printf("backup"); + flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP; + if (flags) + printf(","); + } + + /* bump unknown flags, if any */ + if (flags) + printf("0x%x", flags); + printf(" "); + } + if (attrs->rta_type == MPTCP_PM_ADDR_ATTR_IF_IDX) { + char name[IF_NAMESIZE], *ret; + int32_t ifindex; + + memcpy(&ifindex, RTA_DATA(attrs), 4); + ret = if_indextoname(ifindex, name); + if (ret) + printf("dev %s ", ret); + else + printf("dev unknown/%d", ifindex); + } + + attrs = RTA_NEXT(attrs, len); + } + printf("\n"); +} + +static void print_addrs(struct nlmsghdr *nh, int pm_family, int total_len) +{ + struct rtattr *attrs; + + for (; NLMSG_OK(nh, total_len); nh = NLMSG_NEXT(nh, total_len)) { + int len = nh->nlmsg_len; + + if (nh->nlmsg_type == NLMSG_DONE) + break; + if (nh->nlmsg_type == NLMSG_ERROR) + nl_error(nh); + if (nh->nlmsg_type != pm_family) + continue; + + len -= NLMSG_LENGTH(GENL_HDRLEN); + attrs = (struct rtattr *) ((char *) NLMSG_DATA(nh) + + GENL_HDRLEN); + while (RTA_OK(attrs, len)) { + if (attrs->rta_type == + (MPTCP_PM_ATTR_ADDR | NLA_F_NESTED)) + print_addr((void *)RTA_DATA(attrs), + attrs->rta_len); + attrs = RTA_NEXT(attrs, len); + } + } +} + +int get_addr(int fd, int pm_family, int argc, char *argv[]) +{ + char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) + + NLMSG_ALIGN(sizeof(struct genlmsghdr)) + + 1024]; + struct rtattr *rta, *nest; + struct nlmsghdr *nh; + int nest_start; + u_int8_t id; + int off = 0; + + memset(data, 0, sizeof(data)); + nh = (void *)data; + off = init_genl_req(data, pm_family, MPTCP_PM_CMD_GET_ADDR, + MPTCP_PM_VER); + + /* the only argument is the address id */ + if (argc != 3) + syntax(argv); + + id = atoi(argv[2]); + + nest_start = off; + nest = (void *)(data + off); + nest->rta_type = NLA_F_NESTED | MPTCP_PM_ATTR_ADDR; + nest->rta_len = RTA_LENGTH(0); + off += NLMSG_ALIGN(nest->rta_len); + + /* build a dummy addr with only the ID set */ + rta = (void *)(data + off); + rta->rta_type = MPTCP_PM_ADDR_ATTR_ID; + rta->rta_len = RTA_LENGTH(1); + memcpy(RTA_DATA(rta), &id, 1); + off += NLMSG_ALIGN(rta->rta_len); + nest->rta_len = off - nest_start; + + print_addrs(nh, pm_family, do_nl_req(fd, nh, off, sizeof(data))); + return 0; +} + +int dump_addrs(int fd, int pm_family, int argc, char *argv[]) +{ + char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) + + NLMSG_ALIGN(sizeof(struct genlmsghdr)) + + 1024]; + pid_t pid = getpid(); + struct nlmsghdr *nh; + int off = 0; + + memset(data, 0, sizeof(data)); + nh = (void *)data; + off = init_genl_req(data, pm_family, MPTCP_PM_CMD_GET_ADDR, + MPTCP_PM_VER); + nh->nlmsg_flags |= NLM_F_DUMP; + nh->nlmsg_seq = 1; + nh->nlmsg_pid = pid; + nh->nlmsg_len = off; + + print_addrs(nh, pm_family, do_nl_req(fd, nh, off, sizeof(data))); + return 0; +} + +int flush_addrs(int fd, int pm_family, int argc, char *argv[]) +{ + char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) + + NLMSG_ALIGN(sizeof(struct genlmsghdr)) + + 1024]; + struct nlmsghdr *nh; + int off = 0; + + memset(data, 0, sizeof(data)); + nh = (void *)data; + off = init_genl_req(data, pm_family, MPTCP_PM_CMD_FLUSH_ADDRS, + MPTCP_PM_VER); + + do_nl_req(fd, nh, off, 0); + return 0; +} + +static void print_limits(struct nlmsghdr *nh, int pm_family, int total_len) +{ + struct rtattr *attrs; + uint32_t max; + + for (; NLMSG_OK(nh, total_len); nh = NLMSG_NEXT(nh, total_len)) { + int len = nh->nlmsg_len; + + if (nh->nlmsg_type == NLMSG_DONE) + break; + if (nh->nlmsg_type == NLMSG_ERROR) + nl_error(nh); + if (nh->nlmsg_type != pm_family) + continue; + + len -= NLMSG_LENGTH(GENL_HDRLEN); + attrs = (struct rtattr *) ((char *) NLMSG_DATA(nh) + + GENL_HDRLEN); + while (RTA_OK(attrs, len)) { + int type = attrs->rta_type; + + if (type != MPTCP_PM_ATTR_RCV_ADD_ADDRS && + type != MPTCP_PM_ATTR_SUBFLOWS) + goto next; + + memcpy(&max, RTA_DATA(attrs), 4); + printf("%s %u\n", type == MPTCP_PM_ATTR_SUBFLOWS ? + "subflows" : "accept", max); + +next: + attrs = RTA_NEXT(attrs, len); + } + } +} + +int get_set_limits(int fd, int pm_family, int argc, char *argv[]) +{ + char data[NLMSG_ALIGN(sizeof(struct nlmsghdr)) + + NLMSG_ALIGN(sizeof(struct genlmsghdr)) + + 1024]; + uint32_t rcv_addr = 0, subflows = 0; + int cmd, len = sizeof(data); + struct nlmsghdr *nh; + int off = 0; + + /* limit */ + if (argc == 4) { + rcv_addr = atoi(argv[2]); + subflows = atoi(argv[3]); + cmd = MPTCP_PM_CMD_SET_LIMITS; + } else { + cmd = MPTCP_PM_CMD_GET_LIMITS; + } + + memset(data, 0, sizeof(data)); + nh = (void *)data; + off = init_genl_req(data, pm_family, cmd, MPTCP_PM_VER); + + /* limit */ + if (cmd == MPTCP_PM_CMD_SET_LIMITS) { + struct rtattr *rta = (void *)(data + off); + + rta->rta_type = MPTCP_PM_ATTR_RCV_ADD_ADDRS; + rta->rta_len = RTA_LENGTH(4); + memcpy(RTA_DATA(rta), &rcv_addr, 4); + off += NLMSG_ALIGN(rta->rta_len); + + rta = (void *)(data + off); + rta->rta_type = MPTCP_PM_ATTR_SUBFLOWS; + rta->rta_len = RTA_LENGTH(4); + memcpy(RTA_DATA(rta), &subflows, 4); + off += NLMSG_ALIGN(rta->rta_len); + + /* do not expect a reply */ + len = 0; + } + + len = do_nl_req(fd, nh, off, len); + if (cmd == MPTCP_PM_CMD_GET_LIMITS) + print_limits(nh, pm_family, len); + return 0; +} + +int main(int argc, char *argv[]) +{ + int fd, pm_family; + + if (argc < 2) + syntax(argv); + + fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); + if (fd == -1) + error(1, errno, "socket netlink"); + + pm_family = resolve_mptcp_pm_netlink(fd); + + if (!strcmp(argv[1], "add")) + return add_addr(fd, pm_family, argc, argv); + else if (!strcmp(argv[1], "del")) + return del_addr(fd, pm_family, argc, argv); + else if (!strcmp(argv[1], "flush")) + return flush_addrs(fd, pm_family, argc, argv); + else if (!strcmp(argv[1], "get")) + return get_addr(fd, pm_family, argc, argv); + else if (!strcmp(argv[1], "dump")) + return dump_addrs(fd, pm_family, argc, argv); + else if (!strcmp(argv[1], "limits")) + return get_set_limits(fd, pm_family, argc, argv); + + fprintf(stderr, "unknown sub-command: %s", argv[1]); + syntax(argv); + return 0; +} From patchwork Thu Mar 26 20:46:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 221773 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C29CC2D0E5 for ; Thu, 26 Mar 2020 20:47:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3FFF62073E for ; Thu, 26 Mar 2020 20:47:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727835AbgCZUrK (ORCPT ); Thu, 26 Mar 2020 16:47:10 -0400 Received: from mga05.intel.com ([192.55.52.43]:47904 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727806AbgCZUrJ (ORCPT ); Thu, 26 Mar 2020 16:47:09 -0400 IronPort-SDR: w4i0LsB1GxEvoXkKY/HtCzSFBEprJCYQwcvD8a2eqlkn+vjxkgZ4rVoXrLlOm72djqv97Linok H67FItlbxk4w== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Mar 2020 13:47:05 -0700 IronPort-SDR: kmZG6IxPARTMKHZokYM+0vf0nEf3SKVX4hcO/BVIUUZ2SU2v5tPwfYJTgTZYLNtvr5B12tNubB IDZ0fGTbUcIg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,309,1580803200"; d="scan'208";a="238911705" Received: from mjmartin-nuc02.mjmartin-nuc02 (HELO mjmartin-nuc02.sea.intel.com) ([10.252.133.119]) by fmsmga007.fm.intel.com with ESMTP; 26 Mar 2020 13:47:05 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , eric.dumazet@gmail.com, Mat Martineau Subject: [PATCH net-next v2 17/17] selftests: add test-cases for MPTCP MP_JOIN Date: Thu, 26 Mar 2020 13:46:40 -0700 Message-Id: <20200326204640.67336-18-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> References: <20200326204640.67336-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Paolo Abeni Use the pm netlink to configure the creation of several subflows, and verify that via MIB counters. Update the mptcp_connect program to allow reliable MP_JOIN handshake even on small data file Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- tools/testing/selftests/net/mptcp/Makefile | 2 +- .../selftests/net/mptcp/mptcp_connect.c | 28 +- .../testing/selftests/net/mptcp/mptcp_join.sh | 357 ++++++++++++++++++ 3 files changed, 383 insertions(+), 4 deletions(-) create mode 100755 tools/testing/selftests/net/mptcp/mptcp_join.sh diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile index 70c831fcaf70..f50976ee7d44 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -5,7 +5,7 @@ KSFT_KHDR_INSTALL := 1 CFLAGS = -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include -TEST_PROGS := mptcp_connect.sh pm_netlink.sh +TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh TEST_GEN_FILES = mptcp_connect pm_nl_ctl diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index 702bab2c12da..cedee5b952ba 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -51,6 +51,7 @@ static bool tcpulp_audit; static int pf = AF_INET; static int cfg_sndbuf; static int cfg_rcvbuf; +static bool cfg_join; static void die_usage(void) { @@ -250,6 +251,7 @@ static int sock_connect_mptcp(const char * const remoteaddr, static size_t do_rnd_write(const int fd, char *buf, const size_t len) { + static bool first = true; unsigned int do_w; ssize_t bw; @@ -257,10 +259,19 @@ static size_t do_rnd_write(const int fd, char *buf, const size_t len) if (do_w == 0 || do_w > len) do_w = len; + if (cfg_join && first && do_w > 100) + do_w = 100; + bw = write(fd, buf, do_w); if (bw < 0) perror("write"); + /* let the join handshake complete, before going on */ + if (cfg_join && first) { + usleep(200000); + first = false; + } + return bw; } @@ -385,8 +396,11 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd) break; /* ... but we still receive. - * Close our write side. + * Close our write side, ev. give some time + * for address notification */ + if (cfg_join) + usleep(400000); shutdown(peerfd, SHUT_WR); } else { if (errno == EINTR) @@ -403,6 +417,10 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd) } } + /* leave some time for late join/announce */ + if (cfg_join) + usleep(400000); + close(peerfd); return 0; } @@ -658,7 +676,7 @@ static void maybe_close(int fd) { unsigned int r = rand(); - if (r & 1) + if (!cfg_join && (r & 1)) close(fd); } @@ -794,8 +812,12 @@ static void parse_opts(int argc, char **argv) { int c; - while ((c = getopt(argc, argv, "6lp:s:hut:m:S:R:")) != -1) { + while ((c = getopt(argc, argv, "6jlp:s:hut:m:S:R:")) != -1) { switch (c) { + case 'j': + cfg_join = true; + cfg_mode = CFG_MODE_POLL; + break; case 'l': listen_mode = true; break; diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh new file mode 100755 index 000000000000..dd42c2f692d0 --- /dev/null +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -0,0 +1,357 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +ret=0 +sin="" +sout="" +cin="" +cout="" +ksft_skip=4 +timeout=30 +capture=0 + +TEST_COUNT=0 + +init() +{ + capout=$(mktemp) + + rndh=$(printf %x $sec)-$(mktemp -u XXXXXX) + + ns1="ns1-$rndh" + ns2="ns2-$rndh" + + for netns in "$ns1" "$ns2";do + ip netns add $netns || exit $ksft_skip + ip -net $netns link set lo up + ip netns exec $netns sysctl -q net.mptcp.enabled=1 + ip netns exec $netns sysctl -q net.ipv4.conf.all.rp_filter=0 + ip netns exec $netns sysctl -q net.ipv4.conf.default.rp_filter=0 + done + + # ns1 ns2 + # ns1eth1 ns2eth1 + # ns1eth2 ns2eth2 + # ns1eth3 ns2eth3 + # ns1eth4 ns2eth4 + + for i in `seq 1 4`; do + ip link add ns1eth$i netns "$ns1" type veth peer name ns2eth$i netns "$ns2" + ip -net "$ns1" addr add 10.0.$i.1/24 dev ns1eth$i + ip -net "$ns1" addr add dead:beef:$i::1/64 dev ns1eth$i nodad + ip -net "$ns1" link set ns1eth$i up + + ip -net "$ns2" addr add 10.0.$i.2/24 dev ns2eth$i + ip -net "$ns2" addr add dead:beef:$i::2/64 dev ns2eth$i nodad + ip -net "$ns2" link set ns2eth$i up + + # let $ns2 reach any $ns1 address from any interface + ip -net "$ns2" route add default via 10.0.$i.1 dev ns2eth$i metric 10$i + done +} + +cleanup_partial() +{ + rm -f "$capout" + + for netns in "$ns1" "$ns2"; do + ip netns del $netns + done +} + +cleanup() +{ + rm -f "$cin" "$cout" + rm -f "$sin" "$sout" + cleanup_partial +} + +reset() +{ + cleanup_partial + init +} + +for arg in "$@"; do + if [ "$arg" = "-c" ]; then + capture=1 + fi +done + +ip -Version > /dev/null 2>&1 +if [ $? -ne 0 ];then + echo "SKIP: Could not run test without ip tool" + exit $ksft_skip +fi + + +check_transfer() +{ + in=$1 + out=$2 + what=$3 + + cmp "$in" "$out" > /dev/null 2>&1 + if [ $? -ne 0 ] ;then + echo "[ FAIL ] $what does not match (in, out):" + print_file_err "$in" + print_file_err "$out" + + return 1 + fi + + return 0 +} + +do_ping() +{ + listener_ns="$1" + connector_ns="$2" + connect_addr="$3" + + ip netns exec ${connector_ns} ping -q -c 1 $connect_addr >/dev/null + if [ $? -ne 0 ] ; then + echo "$listener_ns -> $connect_addr connectivity [ FAIL ]" 1>&2 + ret=1 + fi +} + +do_transfer() +{ + listener_ns="$1" + connector_ns="$2" + cl_proto="$3" + srv_proto="$4" + connect_addr="$5" + + port=$((10000+$TEST_COUNT)) + TEST_COUNT=$((TEST_COUNT+1)) + + :> "$cout" + :> "$sout" + :> "$capout" + + if [ $capture -eq 1 ]; then + if [ -z $SUDO_USER ] ; then + capuser="" + else + capuser="-Z $SUDO_USER" + fi + + capfile="mp_join-${listener_ns}.pcap" + + echo "Capturing traffic for test $TEST_COUNT into $capfile" + ip netns exec ${listener_ns} tcpdump -i any -s 65535 -B 32768 $capuser -w $capfile > "$capout" 2>&1 & + cappid=$! + + sleep 1 + fi + + ip netns exec ${listener_ns} ./mptcp_connect -j -t $timeout -l -p $port -s ${srv_proto} 0.0.0.0 < "$sin" > "$sout" & + spid=$! + + sleep 1 + + ip netns exec ${connector_ns} ./mptcp_connect -j -t $timeout -p $port -s ${cl_proto} $connect_addr < "$cin" > "$cout" & + cpid=$! + + wait $cpid + retc=$? + wait $spid + rets=$? + + if [ $capture -eq 1 ]; then + sleep 1 + kill $cappid + fi + + if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then + echo " client exit code $retc, server $rets" 1>&2 + echo "\nnetns ${listener_ns} socket stat for $port:" 1>&2 + ip netns exec ${listener_ns} ss -nita 1>&2 -o "sport = :$port" + echo "\nnetns ${connector_ns} socket stat for $port:" 1>&2 + ip netns exec ${connector_ns} ss -nita 1>&2 -o "dport = :$port" + + cat "$capout" + return 1 + fi + + check_transfer $sin $cout "file received by client" + retc=$? + check_transfer $cin $sout "file received by server" + rets=$? + + if [ $retc -eq 0 ] && [ $rets -eq 0 ];then + cat "$capout" + return 0 + fi + + cat "$capout" + return 1 +} + +make_file() +{ + name=$1 + who=$2 + + SIZE=1 + + dd if=/dev/urandom of="$name" bs=1024 count=$SIZE 2> /dev/null + echo -e "\nMPTCP_TEST_FILE_END_MARKER" >> "$name" + + echo "Created $name (size $SIZE KB) containing data sent by $who" +} + +run_tests() +{ + listener_ns="$1" + connector_ns="$2" + connect_addr="$3" + lret=0 + + do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP ${connect_addr} + lret=$? + if [ $lret -ne 0 ]; then + ret=$lret + return + fi +} + +chk_join_nr() +{ + local msg="$1" + local syn_nr=$2 + local syn_ack_nr=$3 + local ack_nr=$4 + local count + local dump_stats + + printf "%-36s %s" "$msg" "syn" + count=`ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinSynRx | awk '{print $2}'` + [ -z "$count" ] && count=0 + if [ "$count" != "$syn_nr" ]; then + echo "[fail] got $count JOIN[s] syn expected $syn_nr" + ret=1 + dump_stats=1 + else + echo -n "[ ok ]" + fi + + echo -n " - synack" + count=`ip netns exec $ns2 nstat -as | grep MPTcpExtMPJoinSynAckRx | awk '{print $2}'` + [ -z "$count" ] && count=0 + if [ "$count" != "$syn_ack_nr" ]; then + echo "[fail] got $count JOIN[s] synack expected $syn_ack_nr" + ret=1 + dump_stats=1 + else + echo -n "[ ok ]" + fi + + echo -n " - ack" + count=`ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinAckRx | awk '{print $2}'` + [ -z "$count" ] && count=0 + if [ "$count" != "$ack_nr" ]; then + echo "[fail] got $count JOIN[s] ack expected $ack_nr" + ret=1 + dump_stats=1 + else + echo "[ ok ]" + fi + if [ "${dump_stats}" = 1 ]; then + echo Server ns stats + ip netns exec $ns1 nstat -as | grep MPTcp + echo Client ns stats + ip netns exec $ns2 nstat -as | grep MPTcp + fi +} + +sin=$(mktemp) +sout=$(mktemp) +cin=$(mktemp) +cout=$(mktemp) +init +make_file "$cin" "client" +make_file "$sin" "server" +trap cleanup EXIT + +run_tests $ns1 $ns2 10.0.1.1 +chk_join_nr "no JOIN" "0" "0" "0" + +# subflow limted by client +reset +ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow +run_tests $ns1 $ns2 10.0.1.1 +chk_join_nr "single subflow, limited by client" 0 0 0 + +# subflow limted by server +reset +ip netns exec $ns2 ./pm_nl_ctl limits 0 1 +ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow +run_tests $ns1 $ns2 10.0.1.1 +chk_join_nr "single subflow, limited by server" 1 1 0 + +# subflow +reset +ip netns exec $ns1 ./pm_nl_ctl limits 0 1 +ip netns exec $ns2 ./pm_nl_ctl limits 0 1 +ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow +run_tests $ns1 $ns2 10.0.1.1 +chk_join_nr "single subflow" 1 1 1 + +# multiple subflows +reset +ip netns exec $ns1 ./pm_nl_ctl limits 0 2 +ip netns exec $ns2 ./pm_nl_ctl limits 0 2 +ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow +ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow +run_tests $ns1 $ns2 10.0.1.1 +chk_join_nr "multiple subflows" 2 2 2 + +# multiple subflows limited by serverf +reset +ip netns exec $ns1 ./pm_nl_ctl limits 0 1 +ip netns exec $ns2 ./pm_nl_ctl limits 0 2 +ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow +ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow +run_tests $ns1 $ns2 10.0.1.1 +chk_join_nr "multiple subflows, limited by server" 2 2 1 + +# add_address, unused +reset +ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal +run_tests $ns1 $ns2 10.0.1.1 +chk_join_nr "unused signal address" 0 0 0 + +# accept and use add_addr +reset +ip netns exec $ns1 ./pm_nl_ctl limits 0 1 +ip netns exec $ns2 ./pm_nl_ctl limits 1 1 +ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal +run_tests $ns1 $ns2 10.0.1.1 +chk_join_nr "signal address" 1 1 1 + +# accept and use add_addr with an additional subflow +# note: signal address in server ns and local addresses in client ns must +# belong to different subnets or one of the listed local address could be +# used for 'add_addr' subflow +reset +ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal +ip netns exec $ns1 ./pm_nl_ctl limits 0 2 +ip netns exec $ns2 ./pm_nl_ctl limits 1 2 +ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow +run_tests $ns1 $ns2 10.0.1.1 +chk_join_nr "subflow and signal" 2 2 2 + +# accept and use add_addr with additional subflows +reset +ip netns exec $ns1 ./pm_nl_ctl limits 0 3 +ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal +ip netns exec $ns2 ./pm_nl_ctl limits 1 3 +ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow +ip netns exec $ns2 ./pm_nl_ctl add 10.0.4.2 flags subflow +run_tests $ns1 $ns2 10.0.1.1 +chk_join_nr "multiple subflows and signal" 3 3 3 + +exit $ret