From patchwork Mon Apr 14 12:59:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Chia-Yu Chang \(Nokia\)" X-Patchwork-Id: 881137 Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2082.outbound.protection.outlook.com [40.107.22.82]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BD842D4B5C; Mon, 14 Apr 2025 13:19:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.22.82 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744636801; cv=fail; b=pBXVvES2hypAtqdu/++8KM3LgNKGT8Hlp0NE1dvbElz2sn3P1YjXRGZK8hEf6b1aQA+c9sLszHnvxoYnPBt8vUm3NExtq4m1fv8MupQmdaWLjJrqJ3lpqHImqFVzBcQVcZ+5/C0Kaa95lUOE+FSOz3SNPYpy4QedHRI4wUtQXfM= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744636801; c=relaxed/simple; bh=U6q/bXRBi17sP6YLpcdxzeTbpNHFCeio/M7MPrlQrZ4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=HOUvz8YPrO6xFyzEKBlZWCPuZEvOWkAUY3wcaug2Jxl4n6b3yZKRbT6pNlzuqrjgTC4zgT9UqDtS5C7eeqtZvdkPpYqZDa1G47zjPjae/gziF2TLcJi0GHVT7HHjVFGIidfXxSmkEmldc4okQ/n5RocACsnEhBAOhJiwdMnGseo= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nokia-bell-labs.com; spf=fail smtp.mailfrom=nokia-bell-labs.com; dkim=pass (2048-bit key) header.d=nokia-bell-labs.com header.i=@nokia-bell-labs.com header.b=rMOre+Wg; arc=fail smtp.client-ip=40.107.22.82 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nokia-bell-labs.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nokia-bell-labs.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=nokia-bell-labs.com header.i=@nokia-bell-labs.com header.b="rMOre+Wg" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=WIqA5h+FJ/nMTG1Foh/3JpWhKPRfpq29zuhQ1d914EzGQK8gJHRpUTXhtxSx7ucwjVnRsyM7XWarF7sEelZsD64Dmt9iiFWSNwBSxf4lNEInc+HxK5qAxsNzthBAvISPQhhH15nBcKeP21OOMqvBl+8pWQbKKELDQe8O19WhZKvdEqzBiEVhPPmF+x02det7q6eMhPlFUJ7oWTJLDxOSB/9Q5/7wg8+364XR/Sn8jFVVnn9yNWJdQtDzOIy7slQHkwLfAybRPMvF9sAJDkZSAwlEhGab0AawN2Lufo8ZrZwDkMMbjW9gN78dDnV88vSWhj6juMMq43mputF4W0eMGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EO0JnilW42J6qJIyNXcH3joHs9jMP2hYH+EBwh+DOUk=; b=CKqUtkCZicIIqEaUVe0VUEnNx6SZovaIXtfpaq4KjlSjV91VCUrjHv6Wrz6UWP2JxePaCQKijVImSL8Y0D+evjPJ+aL9WKiv94wH2XvNuCahORs1G3Sk0esaGSfmQMcjUtWB4zkInqdqpIY79Gtm65QWr/dGr33XjljU/ZRldCjp03B9W+ghBfNHmf6sQyidVL20sbwMIfB25cMtVUYctRTaXLjOLGJwOA7lh+vxTCAfl2esc9Qrwb3AdbQU51zWHoEd16iMiEAslLR0uy2Emx6dDbN8p3QXrRGJkD5HqYVDq1uKC/A3HJrv6IqqIu0sCLcEyCzPnu8Xm/93rsXzlw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=temperror (sender ip is 131.228.2.20) smtp.rcpttodomain=apple.com smtp.mailfrom=nokia-bell-labs.com; dmarc=temperror action=none header.from=nokia-bell-labs.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nokia-bell-labs.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EO0JnilW42J6qJIyNXcH3joHs9jMP2hYH+EBwh+DOUk=; b=rMOre+WgZ9d1PGbzLdweFsLPoLVBQQQ7dwWUMKCp3+udo59ztrws6KdDkTRn72/6qSX4p2YW+upFwLa3EDb7/vV+BT3lw+iid1w6z/53nafVhVXJJ1qU/MpgPsylYkUGn7WvOe1qvr8DjpIRqdy2lFzcVTvue2lYnBXXqtRPf4pmUXJunPwpYkWjeyCHdSYnKxtmLE0OWiO81NFVYNQMaTOaF3NcMqzyS/tZHLbfJDw8GKU8dC6fNI6Y1iF8JpRsDotkD0/OWlcQCV+my37HV3EguAOGKeZQVQ4mgaLKLS80HJSoECvJan62wbq8qr9EH2D7fvHzWtCBjTyFAeUSPg== Received: from DU2PR04CA0168.eurprd04.prod.outlook.com (2603:10a6:10:2b0::23) by PR3PR07MB8098.eurprd07.prod.outlook.com (2603:10a6:102:171::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8632.34; Mon, 14 Apr 2025 13:19:56 +0000 Received: from DU6PEPF00009525.eurprd02.prod.outlook.com (2603:10a6:10:2b0:cafe::a2) by DU2PR04CA0168.outlook.office365.com (2603:10a6:10:2b0::23) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8632.33 via Frontend Transport; Mon, 14 Apr 2025 13:19:55 +0000 X-MS-Exchange-Authentication-Results: spf=temperror (sender IP is 131.228.2.20) smtp.mailfrom=nokia-bell-labs.com; dkim=none (message not signed) header.d=none;dmarc=temperror action=none header.from=nokia-bell-labs.com; Received-SPF: TempError (protection.outlook.com: error in processing during lookup of nokia-bell-labs.com: DNS Timeout) Received: from fihe3nok0734.emea.nsn-net.net (131.228.2.20) by DU6PEPF00009525.mail.protection.outlook.com (10.167.8.6) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8655.12 via Frontend Transport; Mon, 14 Apr 2025 13:19:53 +0000 Received: from sarah.nbl.nsn-rdnet.net (sarah.nbl.nsn-rdnet.net [10.0.73.150]) by fihe3nok0734.emea.nsn-net.net (Postfix) with ESMTP id 8A6194EE78; Mon, 14 Apr 2025 16:00:08 +0300 (EEST) From: chia-yu.chang@nokia-bell-labs.com To: netdev@vger.kernel.org, dave.taht@gmail.com, pabeni@redhat.com, jhs@mojatatu.com, kuba@kernel.org, stephen@networkplumber.org, xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net, edumazet@google.com, horms@kernel.org, andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net, liuhangbin@gmail.com, shuah@kernel.org, linux-kselftest@vger.kernel.org, ij@kernel.org, ncardwell@google.com, koen.de_schepper@nokia-bell-labs.com, g.white@cablelabs.com, ingemar.s.johansson@ericsson.com, mirja.kuehlewind@ericsson.com, cheshire@apple.com, rs.ietf@gmx.at, Jason_Livingood@comcast.com, vidhi_goel@apple.com Cc: Olivier Tilmans , Chia-Yu Chang Subject: [PATCH v3 net-next 03/15] tcp: AccECN core Date: Mon, 14 Apr 2025 14:59:32 +0200 Message-Id: <20250414125944.97350-4-chia-yu.chang@nokia-bell-labs.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250414125944.97350-1-chia-yu.chang@nokia-bell-labs.com> References: <20250414125944.97350-1-chia-yu.chang@nokia-bell-labs.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DU6PEPF00009525:EE_|PR3PR07MB8098:EE_ X-MS-Office365-Filtering-Correlation-Id: ea0f08ae-6328-4f2e-b858-08dd7b570b26 X-LD-Processed: 5d471751-9675-428d-917b-70f44f9630b0,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|7416014|376014|1800799024|36860700013|82310400026|921020; X-Microsoft-Antispam-Message-Info: =?utf-8?q?veygljnB0zxIdj5c7utTGkrQhvzdM1D?= =?utf-8?q?TP1QrvyZaN2bcubhuw+NtKzqy8Gay/PyiVyFhdkd7IeEIbUI+/FrTZ/SXOTK2M71F?= =?utf-8?q?DcY4nrs/1CXjWR6EXB5cQzV02rlNcav5LIvU+ZnD9vRIuXCcQYYZwi0NdZ4OI1FDD?= =?utf-8?q?WnMewxYhKje2eRszSPsjCw7v/+gQJEvCXL8L86AifMzP1J7pcvZHtm+Fyp+Cd+MbN?= =?utf-8?q?3jPgRAGGxOyYzB2l6fCb/1ZhAqKLLN3w89FC72h0zESbw00TseljFSKxYCIAqbfPy?= =?utf-8?q?yLrOENXdzj9D87sJpC3NVgCIwTN2R6GcVTKvk+092M/RDStXU4R/QpkZ79qY64o1X?= =?utf-8?q?zulyybMpPNlVXwqDvjUPEzuJh3+QffhDGvGfO5f6l4UFI9Zz0BsQY3XYhA+yxALbX?= =?utf-8?q?+bsIdiCIq2zihwkzo+tnjxmQrT/rlbiZFoKzWU42ZFzkRbcLXkdpcBLl9gksKek5H?= =?utf-8?q?8/58zETJYYS9DO7gB5UcNSanlbtFxIjByllH0JiJKvNLzR3TVWkiEQBFp75AW0X/o?= =?utf-8?q?c+WGHACiNVkvjkX0n/sewxuGDvjUdvYMKM7BXqSLzeG9n2C1djtrGvBCNSTQ+8Ij8?= =?utf-8?q?x8rqzs8rVOQQqmk9jbqA6HzMWeLNdkspmlQzt8v7Od5PlwoaR5PLlGLc9xTSVc33p?= =?utf-8?q?EfkIt25HyJtMGICNryHxTdQD5M1XHEFC+ymnH7E65SLzfan+dmqe3rZuH13Mhjhde?= =?utf-8?q?mIRqC69dxp5+pPGUQTXCap01FjF/AaASJFitUJ7s/coh1T78+F8iPrEU67Xgqf7Uk?= =?utf-8?q?/iQ8irsO665YHtM91hogMqmL1BaHGrXJazKnAxNbn/JQt4mEYoij/zyTpKq6gRXxt?= =?utf-8?q?Ijp1x2b+s5Wz628FR1sVlx8+mfJCqrTCTN52a+UxwYLDmPpgBRTtkimoBMwv834EA?= =?utf-8?q?LQ7SvkHRq5JinbThnWIfRww8eSU4kDWwMdzPUIkPIgRUwPDV2wIx4v2IglHUziyeM?= =?utf-8?q?3LKy/OfDLu3ZxO/TSa1P1DYDDHM0XrUEtYgtFQZbmRI++a6Xyma21mtpr/Y2Qxu1H?= =?utf-8?q?te7wqMLmHCa5WUTmdthqRGj1UjPuqpqmr/mOo2VJwqjDrwIPR80q04hzZAnswb1OR?= =?utf-8?q?J/f7Oy9mG11g83Ps0sbo6XiOQSJ/DcO4aOZva03MCwYCvDG3n4sJjq5bamwDoDKqG?= =?utf-8?q?/kCuFjkNJlFX44DMnv3VrtQgXFIeC29yA4TZ/RGhpsn6yLKhQjxtddERSNHiF3Ft+?= =?utf-8?q?CvxajAzSIU5mgQ64uj61isT/RYaaw2EkuZXCpnOP8tIWyGTElToVenS2XN8eyPGok?= =?utf-8?q?z+GyxdDombgEfNa/SpwCODle6vimfmF35FahLpSs8I9SrFWFG397uKgFzW5pqTLi+?= =?utf-8?q?nld3GFFLQYsLYHqIABrF4IFrkJqBdiQ/wn9DsRdNdp2W8nOOCpaCGkLFUxxLVhDqp?= =?utf-8?q?g+8otuODzXDKm8OFEGE66O7av177Ommf0YUes6fLcDlbc+AKos6N4Ey5wjLXOebQX?= =?utf-8?q?npeMtd1eBKubvBlnLaP5Kd/SkaTdwDXg=3D=3D?= X-Forefront-Antispam-Report: CIP:131.228.2.20; CTRY:FI; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:fihe3nok0734.emea.nsn-net.net; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(7416014)(376014)(1800799024)(36860700013)(82310400026)(921020); DIR:OUT; SFP:1101; X-OriginatorOrg: nokia-bell-labs.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Apr 2025 13:19:53.4686 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ea0f08ae-6328-4f2e-b858-08dd7b570b26 X-MS-Exchange-CrossTenant-Id: 5d471751-9675-428d-917b-70f44f9630b0 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=5d471751-9675-428d-917b-70f44f9630b0; Ip=[131.228.2.20]; Helo=[fihe3nok0734.emea.nsn-net.net] X-MS-Exchange-CrossTenant-AuthSource: DU6PEPF00009525.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3PR07MB8098 From: Ilpo Järvinen This change implements Accurate ECN without negotiation and AccECN Option (that will be added by later changes). Based on AccECN specifications: https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt Accurate ECN allows feeding back the number of CE (congestion experienced) marks accurately to the sender in contrast to RFC3168 ECN that can only signal one marks-seen-yes/no per RTT. Congestion control algorithms can take advantage of the accurate ECN information to fine-tune their congestion response to avoid drastic rate reduction when only mild congestion is encountered. With Accurate ECN, tp->received_ce (r.cep in AccECN spec) keeps track of how many segments have arrived with a CE mark. Accurate ECN uses ACE field (ECE, CWR, AE) to communicate the value back to the sender which updates tp->delivered_ce (s.cep) based on the feedback. This signalling channel is lossy when ACE field overflow occurs. Conservative strategy is selected here to deal with the ACE overflow, however, some strategies using the AccECN option later in the overall patchset mitigate against false overflows detected. The ACE field values on the wire are offset by TCP_ACCECN_CEP_INIT_OFFSET. Delivered_ce/received_ce count the real CE marks rather than forcing all downstream users to adapt to the wire offset. Co-developed-by: Olivier Tilmans Signed-off-by: Olivier Tilmans Signed-off-by: Ilpo Järvinen Co-developed-by: Chia-Yu Chang Signed-off-by: Chia-Yu Chang --- include/linux/tcp.h | 3 ++ include/net/tcp.h | 26 +++++++++ net/ipv4/tcp.c | 4 +- net/ipv4/tcp_input.c | 121 +++++++++++++++++++++++++++++++++++++----- net/ipv4/tcp_output.c | 21 +++++++- 5 files changed, 160 insertions(+), 15 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 1669d95bb0f9..e36018203bd0 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -298,6 +298,9 @@ struct tcp_sock { u32 snd_up; /* Urgent pointer */ u32 delivered; /* Total data packets delivered incl. rexmits */ u32 delivered_ce; /* Like the above but only ECE marked packets */ + u32 received_ce; /* Like the above but for rcvd CE marked pkts */ + u8 received_ce_pending:4, /* Not yet transmit cnt of received_ce */ + unused2:4; u32 app_limited; /* limited until "delivered" reaches this val */ u32 rcv_wnd; /* Current receiver window */ /* diff --git a/include/net/tcp.h b/include/net/tcp.h index 4dacd4a11669..cc28255deef7 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -415,6 +415,11 @@ static inline void tcp_ecn_mode_set(struct tcp_sock *tp, u8 mode) tp->ecn_flags |= mode; } +static inline u8 tcp_accecn_ace(const struct tcphdr *th) +{ + return (th->ae << 2) | (th->cwr << 1) | th->ece; +} + enum tcp_tw_status { TCP_TW_SUCCESS = 0, TCP_TW_RST = 1, @@ -964,6 +969,20 @@ static inline u32 tcp_rsk_tsval(const struct tcp_request_sock *treq) #define TCPHDR_ACE (TCPHDR_ECE | TCPHDR_CWR | TCPHDR_AE) #define TCPHDR_SYN_ECN (TCPHDR_SYN | TCPHDR_ECE | TCPHDR_CWR) +#define TCP_ACCECN_CEP_ACE_MASK 0x7 +#define TCP_ACCECN_ACE_MAX_DELTA 6 + +/* To avoid/detect middlebox interference, not all counters start at 0. + * See draft-ietf-tcpm-accurate-ecn for the latest values. + */ +#define TCP_ACCECN_CEP_INIT_OFFSET 5 + +static inline void tcp_accecn_init_counters(struct tcp_sock *tp) +{ + tp->received_ce = 0; + tp->received_ce_pending = 0; +} + /* State flags for sacked in struct tcp_skb_cb */ enum tcp_skb_cb_sacked_flags { TCPCB_SACKED_ACKED = (1 << 0), /* SKB ACK'd by a SACK block */ @@ -1773,11 +1792,18 @@ static inline bool tcp_paws_reject(const struct tcp_options_received *rx_opt, static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd) { + u32 ace; + /* mptcp hooks are only on the slow path */ if (sk_is_mptcp((struct sock *)tp)) return; + ace = tcp_ecn_mode_accecn(tp) ? + ((tp->delivered_ce + TCP_ACCECN_CEP_INIT_OFFSET) & + TCP_ACCECN_CEP_ACE_MASK) : 0; + tp->pred_flags = htonl((tp->tcp_header_len << 26) | + (ace << 22) | ntohl(TCP_FLAG_ACK) | snd_wnd); } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e0e96f8fd47c..372c58170f4c 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3364,6 +3364,7 @@ int tcp_disconnect(struct sock *sk, int flags) tp->window_clamp = 0; tp->delivered = 0; tp->delivered_ce = 0; + tcp_accecn_init_counters(tp); if (icsk->icsk_ca_initialized && icsk->icsk_ca_ops->release) icsk->icsk_ca_ops->release(sk); memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv)); @@ -5088,6 +5089,7 @@ static void __init tcp_struct_check(void) CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, snd_up); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, delivered); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, delivered_ce); + CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, received_ce); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, app_limited); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_wnd); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rx_opt); @@ -5095,7 +5097,7 @@ static void __init tcp_struct_check(void) /* 32bit arches with 8byte alignment on u64 fields might need padding * before tcp_clock_cache. */ - CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 92 + 4); + CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 97 + 7); /* RX read-write hotpath cache lines */ CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, bytes_received); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a35018e2d0ba..8dbb625f5e8a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -341,14 +341,17 @@ static bool tcp_in_quickack_mode(struct sock *sk) static void tcp_ecn_queue_cwr(struct tcp_sock *tp) { + /* Do not set CWR if in AccECN mode! */ if (tcp_ecn_mode_rfc3168(tp)) tp->ecn_flags |= TCP_ECN_QUEUE_CWR; } static void tcp_ecn_accept_cwr(struct sock *sk, const struct sk_buff *skb) { - if (tcp_hdr(skb)->cwr) { - tcp_sk(sk)->ecn_flags &= ~TCP_ECN_DEMAND_CWR; + struct tcp_sock *tp = tcp_sk(sk); + + if (tcp_ecn_mode_rfc3168(tp) && tcp_hdr(skb)->cwr) { + tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; /* If the sender is telling us it has entered CWR, then its * cwnd may be very low (even just 1 packet), so we should ACK @@ -384,17 +387,16 @@ static void tcp_data_ecn_check(struct sock *sk, const struct sk_buff *skb) if (tcp_ca_needs_ecn(sk)) tcp_ca_event(sk, CA_EVENT_ECN_IS_CE); - if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) { + if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR) && + tcp_ecn_mode_rfc3168(tp)) { /* Better not delay acks, sender can have a very low cwnd */ tcp_enter_quickack_mode(sk, 2); tp->ecn_flags |= TCP_ECN_DEMAND_CWR; } - tp->ecn_flags |= TCP_ECN_SEEN; break; default: if (tcp_ca_needs_ecn(sk)) tcp_ca_event(sk, CA_EVENT_ECN_NO_CE); - tp->ecn_flags |= TCP_ECN_SEEN; break; } } @@ -428,10 +430,64 @@ static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered, bool ece_ack) { tp->delivered += delivered; - if (ece_ack) + if (tcp_ecn_mode_rfc3168(tp) && ece_ack) tcp_count_delivered_ce(tp, delivered); } +/* Returns the ECN CE delta */ +static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb, + u32 delivered_pkts, int flag) +{ + const struct tcphdr *th = tcp_hdr(skb); + struct tcp_sock *tp = tcp_sk(sk); + u32 delta, safe_delta; + u32 corrected_ace; + + /* Reordered ACK or uncertain due to lack of data to send and ts */ + if (!(flag & (FLAG_FORWARD_PROGRESS | FLAG_TS_PROGRESS))) + return 0; + + if (!(flag & FLAG_SLOWPATH)) { + /* AccECN counter might overflow on large ACKs */ + if (delivered_pkts <= TCP_ACCECN_CEP_ACE_MASK) + return 0; + } + + /* ACE field is not available during handshake */ + if (flag & FLAG_SYN_ACKED) + return 0; + + if (tp->received_ce_pending >= TCP_ACCECN_ACE_MAX_DELTA) + inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW; + + corrected_ace = tcp_accecn_ace(th) - TCP_ACCECN_CEP_INIT_OFFSET; + delta = (corrected_ace - tp->delivered_ce) & TCP_ACCECN_CEP_ACE_MASK; + if (delivered_pkts <= TCP_ACCECN_CEP_ACE_MASK) + return delta; + + safe_delta = delivered_pkts - + ((delivered_pkts - delta) & TCP_ACCECN_CEP_ACE_MASK); + + return safe_delta; +} + +static u32 tcp_accecn_process(struct sock *sk, const struct sk_buff *skb, + u32 delivered_pkts, int *flag) +{ + struct tcp_sock *tp = tcp_sk(sk); + u32 delta; + + delta = __tcp_accecn_process(sk, skb, delivered_pkts, *flag); + if (delta > 0) { + tcp_count_delivered_ce(tp, delta); + *flag |= FLAG_ECE; + /* Recalculate header predictor */ + if (tp->pred_flags) + tcp_fast_path_on(tp); + } + return delta; +} + /* Buffer size and advertised window tuning. * * 1. Tuning sk->sk_sndbuf, when connection enters established state. @@ -3919,7 +3975,8 @@ static void tcp_xmit_recovery(struct sock *sk, int rexmit) } /* Returns the number of packets newly acked or sacked by the current ACK */ -static u32 tcp_newly_delivered(struct sock *sk, u32 prior_delivered, int flag) +static u32 tcp_newly_delivered(struct sock *sk, u32 prior_delivered, + u32 ecn_count, int flag) { const struct net *net = sock_net(sk); struct tcp_sock *tp = tcp_sk(sk); @@ -3927,8 +3984,12 @@ static u32 tcp_newly_delivered(struct sock *sk, u32 prior_delivered, int flag) delivered = tp->delivered - prior_delivered; NET_ADD_STATS(net, LINUX_MIB_TCPDELIVERED, delivered); - if (flag & FLAG_ECE) - NET_ADD_STATS(net, LINUX_MIB_TCPDELIVEREDCE, delivered); + + if (flag & FLAG_ECE) { + if (tcp_ecn_mode_rfc3168(tp)) + ecn_count = delivered; + NET_ADD_STATS(net, LINUX_MIB_TCPDELIVEREDCE, ecn_count); + } return delivered; } @@ -3949,6 +4010,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) u32 delivered = tp->delivered; u32 lost = tp->lost; int rexmit = REXMIT_NONE; /* Flag to (re)transmit to recover losses */ + u32 ecn_count = 0; /* Did we receive ECE/an AccECN ACE update? */ u32 prior_fack; sack_state.first_sackt = 0; @@ -4056,6 +4118,11 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) tcp_rack_update_reo_wnd(sk, &rs); + if (tcp_ecn_mode_accecn(tp)) + ecn_count = tcp_accecn_process(sk, skb, + tp->delivered - delivered, + &flag); + tcp_in_ack_event(sk, flag); if (tp->tlp_high_seq) @@ -4080,7 +4147,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) sk_dst_confirm(sk); - delivered = tcp_newly_delivered(sk, delivered, flag); + delivered = tcp_newly_delivered(sk, delivered, ecn_count, flag); + lost = tp->lost - lost; /* freshly marked lost */ rs.is_ack_delayed = !!(flag & FLAG_ACK_MAYBE_DELAYED); tcp_rate_gen(sk, delivered, lost, is_sack_reneg, sack_state.rate); @@ -4089,12 +4157,16 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) return 1; no_queue: + if (tcp_ecn_mode_accecn(tp)) + ecn_count = tcp_accecn_process(sk, skb, + tp->delivered - delivered, + &flag); tcp_in_ack_event(sk, flag); /* If data was DSACKed, see if we can undo a cwnd reduction. */ if (flag & FLAG_DSACKING_ACK) { tcp_fastretrans_alert(sk, prior_snd_una, num_dupack, &flag, &rexmit); - tcp_newly_delivered(sk, delivered, flag); + tcp_newly_delivered(sk, delivered, ecn_count, flag); } /* If this ack opens up a zero window, clear backoff. It was * being used to time the probes, and is probably far higher than @@ -4115,7 +4187,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) &sack_state); tcp_fastretrans_alert(sk, prior_snd_una, num_dupack, &flag, &rexmit); - tcp_newly_delivered(sk, delivered, flag); + tcp_newly_delivered(sk, delivered, ecn_count, flag); tcp_xmit_recovery(sk, rexmit); } @@ -5952,6 +6024,26 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, const struct tcphdr *t } } +/* Updates Accurate ECN received counters from the received IP ECN field */ +static void tcp_ecn_received_counters(struct sock *sk, + const struct sk_buff *skb) +{ + u8 ecnfield = TCP_SKB_CB(skb)->ip_dsfield & INET_ECN_MASK; + u8 is_ce = INET_ECN_is_ce(ecnfield); + struct tcp_sock *tp = tcp_sk(sk); + + if (!INET_ECN_is_not_ect(ecnfield)) { + u32 pcount = is_ce * max_t(u16, 1, skb_shinfo(skb)->gso_segs); + + tp->ecn_flags |= TCP_ECN_SEEN; + + /* ACE counter tracks *all* segments including pure ACKs */ + tp->received_ce += pcount; + tp->received_ce_pending = min(tp->received_ce_pending + pcount, + 0xfU); + } +} + /* Accept RST for rcv_nxt - 1 after a FIN. * When tcp connections are abruptly terminated from Mac OSX (via ^C), a * FIN is sent followed by a RST packet. The RST is sent with the same @@ -6214,6 +6306,8 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) flag |= __tcp_replace_ts_recent(tp, delta); + tcp_ecn_received_counters(sk, skb); + /* We know that such packets are checksummed * on entry. */ @@ -6258,6 +6352,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) /* Bulk data transfer: receiver */ tcp_cleanup_skb(skb); __skb_pull(skb, tcp_header_len); + tcp_ecn_received_counters(sk, skb); eaten = tcp_queue_rcv(sk, skb, &fragstolen); tcp_event_data_recv(sk, skb); @@ -6298,6 +6393,8 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) return; step5: + tcp_ecn_received_counters(sk, skb); + reason = tcp_ack(sk, skb, FLAG_SLOWPATH | FLAG_UPDATE_TS_RECENT); if ((int)reason < 0) { reason = -reason; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 9a1ab946ff62..9c978d12c7cf 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -374,6 +374,17 @@ tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th) th->ece = 1; } +static void tcp_accecn_set_ace(struct tcphdr *th, struct tcp_sock *tp) +{ + u32 wire_ace; + + wire_ace = tp->received_ce + TCP_ACCECN_CEP_INIT_OFFSET; + th->ece = !!(wire_ace & 0x1); + th->cwr = !!(wire_ace & 0x2); + th->ae = !!(wire_ace & 0x4); + tp->received_ce_pending = 0; +} + /* Set up ECN state for a packet on a ESTABLISHED socket that is about to * be sent. */ @@ -382,11 +393,17 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb, { struct tcp_sock *tp = tcp_sk(sk); - if (tcp_ecn_mode_rfc3168(tp)) { + if (!tcp_ecn_mode_any(tp)) + return; + + INET_ECN_xmit(sk); + if (tcp_ecn_mode_accecn(tp)) { + tcp_accecn_set_ace(th, tp); + skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN; + } else { /* Not-retransmitted data segment: set ECT and inject CWR. */ if (skb->len != tcp_header_len && !before(TCP_SKB_CB(skb)->seq, tp->snd_nxt)) { - INET_ECN_xmit(sk); if (tp->ecn_flags & TCP_ECN_QUEUE_CWR) { tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR; th->cwr = 1;