From patchwork Thu Mar 11 18:03:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 398184 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 584EDC433E0 for ; Thu, 11 Mar 2021 18:04:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 16FAF64F94 for ; Thu, 11 Mar 2021 18:04:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229883AbhCKSEH (ORCPT ); Thu, 11 Mar 2021 13:04:07 -0500 Received: from mail-eopbgr750074.outbound.protection.outlook.com ([40.107.75.74]:41157 "EHLO NAM02-BL2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229942AbhCKSEB (ORCPT ); Thu, 11 Mar 2021 13:04:01 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Q+Sd7VzoCUejWAyb0VnmAvfKFeNZXHKmB5U7a6xCapJGFJR+9Q5jTIfFYTMkLeynHLl2ZYJ2XhxHwE7l2+OXH9qfNILAWcWtb69qdp6eeTsgDlX3d1N+VBAn+vDdV7B1FOvTXrDJVldlKxCJDqpZn3M5GuAHIfepyW270lfi2TFbEOj91cjOv0+KJTxDGDQmsZKwUAt0Q9/fs1gxOdKNE7LbtHI68/QLEJLpnQfTIZrmXkIY3SOhIfQPSE6i/nRSKe833WjGFZXqgtg40cdm+2hQ8gwOtIIQ84i5J/IwuzN83v9sg/rLPCdUjCsh/MfvV59uqMG5ID0cbnULbm7ywQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VFgq2Ky8ZlHp9x9luqt/0eIqnWbVAIvZjS+sKwDDeqI=; b=Noy/gqUg2IbI5rHPosNcAhW8GInAe8bWhpSd31Q5lMxrX3oL1byllFbi5g7i7QR9WVX87bGxViQhVwJSVjQjPEZjq91xexU/eYcPlVl8gksnkIMN0r9rYrRBDtrYzx51d8LK+iW/QpqaFmNX5YWv6uaS8sKIs1BSacxOKQKoIswm75E09q+YBKkdwd/8YFCVPh4LODasVvYaEZNhvc+zSL7jQtBMmuFiAIVkabCRmIz4hVjFEP7RSnGEKEGHxSl5nTSz9KdAS2rUURci7M6ebmeWoG/FylnLZWh5/2kvPkFtL2QgkpGvGGToKMBPIuM8tXsVPyoVkfybagudXzotSA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VFgq2Ky8ZlHp9x9luqt/0eIqnWbVAIvZjS+sKwDDeqI=; b=EdMIWbbakMWx7g0OuzKhuU5bc9xIy31jxsCDKiHzyM06PlfjRHbySuzr8XtgC+IgnbCXM6hwKL2NiBEKu20IBS2ZntoiFo6C9odOSbTw6HzUwNSH6aIqLFCMadybaQOpz7cQbmEsPsBgc+ee2Se4OIFn4aFB+BqjPVfkKcDhvU5WtIIeTh7jNjK+7aPdMksP2sZv/bke0+mwvpgvQFySJ9hUhCp/RvMTCwzFksLQepRIwmCrfQXC9++xgjSDKjlEUVuRiPY/FTxZL2UPOdbT6ugAMaL5MEPgERy8kgFhgpNHiNH2RhYFkC1M4U0qlP5j6w+In+T6waAidi9vQzcLSg== Received: from BN1PR10CA0019.namprd10.prod.outlook.com (2603:10b6:408:e0::24) by BL0PR12MB2340.namprd12.prod.outlook.com (2603:10b6:207:4d::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.30; Thu, 11 Mar 2021 18:03:58 +0000 Received: from BN8NAM11FT045.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e0:cafe::a5) by BN1PR10CA0019.outlook.office365.com (2603:10b6:408:e0::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17 via Frontend Transport; Thu, 11 Mar 2021 18:03:58 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT045.mail.protection.outlook.com (10.13.177.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:03:57 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:03:54 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 01/14] nexthop: Pass nh_config to replace_nexthop() Date: Thu, 11 Mar 2021 19:03:12 +0100 Message-ID: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 8a385d4f-d856-44eb-40d7-08d8e4b80aee X-MS-TrafficTypeDiagnostic: BL0PR12MB2340: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:4125; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: oFCVHcBwEukJ5qrcnoqTrWhr9x2DKsXWFnHpRxHAlBjq6SPrnJwDuzMPDCdcXgq4CavuLBnlBt6gjX/nNQrE+IEt7c49MAZV71UgmCWCfTEqsp+xHyjHUUGjqaxB4l5WnKvUTA1FHukkmRY8h8+Ia6szCBN4ZtB3l5lzLGHG+XJdFGL/Juw5lm2n0/OvE47eiV22vTKs1p7qkO5ZMgdmfUiKGhy3CsKNb83TPOIH+lqFNYfboqcbiOoW4sAYVpQua5/EJWQIpC/wAzoFm5eExrEM6K5ejvxj1HPgA7swx+lQQRoQPhH+KqokNUw8BguXju+3FbAk+O2++dm967iCxQfuis8AWUnfVOjIpm6EtiMCEFoB++Mj0II9urnGZ4nP/I/n+B2NfWbujbSDgjnsHaRlaXB4O6T8vCQBSV6LUCdNolEUDGU6MDPV7i7r8vwg16MWwxPaEQnOUXohS6itroUiZhJLSNOks11lhHfabFMI1S6agEK9BaHvzZ6lh6qqdOtXg9CsWDz02LKs9T5FIYu4Ry5mtjuOk7FPna8XZsRfT2j8i2q0cWZxfyCPfVVQ7nV33qij5eqqJWsclPXoKCM3LCyV2PMIJ5a1Mua+WVH5MwKLOmLvOvafODqJCrAB3aQy7XoKV2tkTHXiIXjxmEjIimlhnZloOlzDavtZD35SAmd5riLlr065uVRhileT X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(346002)(396003)(136003)(376002)(39860400002)(46966006)(36840700001)(16526019)(186003)(36906005)(82310400003)(107886003)(7636003)(86362001)(26005)(316002)(54906003)(83380400001)(70586007)(8676002)(2906002)(47076005)(82740400003)(6666004)(70206006)(5660300002)(36860700001)(8936002)(356005)(4326008)(34020700004)(336012)(36756003)(426003)(2616005)(478600001)(6916009); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:03:57.9867 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8a385d4f-d856-44eb-40d7-08d8e4b80aee X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT045.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL0PR12MB2340 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently, replace assumes that the new group that is given is a fully-formed object. But mpath groups really only have one attribute, and that is the constituent next hop configuration. This may not be universally true. From the usability perspective, it is desirable to allow the replace operation to adjust just the constituent next hop configuration and leave the group attributes as such intact. But the object that keeps track of whether an attribute was or was not given is the nh_config object, not the next hop or next-hop group. To allow (selective) attribute updates during NH group replacement, propagate `cfg' to replace_nexthop() and further to replace_nexthop_grp(). Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: David Ahern --- net/ipv4/nexthop.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 743777bce179..f723dc97dcd3 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -1107,7 +1107,7 @@ static void nh_rt_cache_flush(struct net *net, struct nexthop *nh) } static int replace_nexthop_grp(struct net *net, struct nexthop *old, - struct nexthop *new, + struct nexthop *new, const struct nh_config *cfg, struct netlink_ext_ack *extack) { struct nh_group *oldg, *newg; @@ -1276,7 +1276,8 @@ static void nexthop_replace_notify(struct net *net, struct nexthop *nh, } static int replace_nexthop(struct net *net, struct nexthop *old, - struct nexthop *new, struct netlink_ext_ack *extack) + struct nexthop *new, const struct nh_config *cfg, + struct netlink_ext_ack *extack) { bool new_is_reject = false; struct nh_grp_entry *nhge; @@ -1319,7 +1320,7 @@ static int replace_nexthop(struct net *net, struct nexthop *old, } if (old->is_group) - err = replace_nexthop_grp(net, old, new, extack); + err = replace_nexthop_grp(net, old, new, cfg, extack); else err = replace_nexthop_single(net, old, new, extack); @@ -1361,7 +1362,7 @@ static int insert_nexthop(struct net *net, struct nexthop *new_nh, } else if (new_id > nh->id) { pp = &next->rb_right; } else if (replace) { - rc = replace_nexthop(net, nh, new_nh, extack); + rc = replace_nexthop(net, nh, new_nh, cfg, extack); if (!rc) { new_nh = nh; /* send notification with old nh */ replace_notify = 1; From patchwork Thu Mar 11 18:03:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 398183 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FFC8C433DB for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CB79E64FEB for ; Thu, 11 Mar 2021 18:05:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230056AbhCKSEf (ORCPT ); Thu, 11 Mar 2021 13:04:35 -0500 Received: from mail-dm6nam11on2075.outbound.protection.outlook.com ([40.107.223.75]:20417 "EHLO NAM11-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229490AbhCKSED (ORCPT ); Thu, 11 Mar 2021 13:04:03 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NVAFnrSqKuK7ir8TKTq4JKDgCJmTs9nNuIMbJnTGRHVRl5MbhN0DEjfx+ZuZeB0K7EstUg7iu5hlkd59kvty3OwdzwJYJPNGQJwUGVxVKV9KgOEDGBCC5hC4BWWi10n/8t0+cKWeG/bd6qzSawTqPkb9IGqfN9Q1VWuK+1667whiZoR7n5gV5uPANrzZxQ93j3Gw5yt29VvnpBT4+83+Ch4QXe7SeT7thgTi3lE5viTq8Wx6C3PvgnjMZvUDneKgXd+nCCv5Vv9dOk+4D5mUqLkeek8pvDdSGcuH2/BOXZ5YYmUDtE6ToIFL3FRnFl94IDaE5s1bKpjcGu6IiBjRZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=c4xfQ9fGVDQJILVqnGTi6RvuGSY4xfXoO8djyMwRuWY=; b=iOJstMfQ7WbmF7IU4x16gua+/WL1dJx1MRsXTqltevp0sjYRSlhbVbbYDIdc9aAuvD1MY/cYwJw/hnLrfltpCWpmE6zuVOdfmsl1hJ5D/zH/+Gl2JLW17r1txn9T0xoDbc67MPa9Nq5Jpk2jIOvEYZubyzkEpoBCK56dXBmFIGayuvlCnaW3iwJc0kLky1PiSS/H9cWmNFO4BNv47fDWKCE09vqcECb6Vkco7JJ6zsunpVac2ms2HrzJ85LDSM6bsjSWMnfHntaUpxkVXpwTDSET+RiYrqXwfz0K9rHKQFsZjPSZ82lbgBxcWL/1DJVteFY0ZKJR33G+eT6yKkYqHg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=c4xfQ9fGVDQJILVqnGTi6RvuGSY4xfXoO8djyMwRuWY=; b=o1v8xcuAcmPunGFsqsLnI/ez2TxYnwms543qvPxUsxKPAoF6pTCHoPBgP8JVM4/JIWgp0cvPsN9iXyc0Zqr1O3ZjI/IToVxLFP92Pw5WzbL7KEzwpCNXiTwmK8tkzcY/tDNThfoLhr+gcc6GKkv0sUxE2DKofACXDRtTqoNtF2GjigMv6fzqIiBkRT7ms3fNvZwJYaTPdE4pMMHv1jzr+l+UsTL6exrr+jqso5M9IjxWX5Bpnr0zdPbnm+Je+ylDDq7zunexjarvMs9TqcwcUscSfsf+pWBfeNkvC2NtVcMFTtyakWJln7glK93pIErUtKQfXTeXZNCT5YUmPXtiyA== Received: from BN6PR11CA0042.namprd11.prod.outlook.com (2603:10b6:404:4b::28) by BY5PR12MB4835.namprd12.prod.outlook.com (2603:10b6:a03:1fd::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17; Thu, 11 Mar 2021 18:04:01 +0000 Received: from BN8NAM11FT065.eop-nam11.prod.protection.outlook.com (2603:10b6:404:4b:cafe::e3) by BN6PR11CA0042.outlook.office365.com (2603:10b6:404:4b::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17 via Frontend Transport; Thu, 11 Mar 2021 18:04:01 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT065.mail.protection.outlook.com (10.13.177.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:00 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:03:57 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 02/14] nexthop: __nh_notifier_single_info_init(): Make nh_info an argument Date: Thu, 11 Mar 2021 19:03:13 +0100 Message-ID: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: e3cc282b-84f1-44db-936f-08d8e4b80c9d X-MS-TrafficTypeDiagnostic: BY5PR12MB4835: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Risg9F40Dv2dFPlEQeLaQ6uDaM/P4BAcZYNeQseLEzbHtSJcMlICCZT+hrZ2P+Bi1+Jv2vRJ645sO/zAMYeA0H7G91TjJ76LPNjlLPIuDHNS9a0mGAuIvvKljbtAH17//EaBLViBCFL2fFWO4fq0methK2oeqMfuk1FZidqcHyR1NGC4xVq4xsIqLoZ4QTk1cEAQzb6P5ovnpEz4bNmDDszG92oemAawih49gJVU24SbF1jyp9Rbuc4Jf1c8uyIr3GHjgCvdLgeMbe4lmx5eL6nHlqONLvkI8iZOgwmdXMXomv9+Ff6vFskG9Bzw2wJqjyPvw+eFvTl7DAMNqfh2w1rP1XqznlsgVT6hDBkO8NMBFBP53XikxeQtCDJJKfFBcjE3zibJifJrev5fqEeBkPQbXsGtaQ58FEb9AMNppUuYv9TmyqUw7dt4VYk4HV9b4Wc7sFn4s7Hw4VLbvThv/BA6SETMFeGUVUSOD+AiaTJI6MnTPq9Q1C3jn1ipZIS5uPCcwbsP4myGkO2EGO1pJ79rDwcODe4w3XyBOZ4IVUhllT5BGtIeAHna50z6gSkmP6DzUwC6u5gpJ8p0hKYef52+rzDT3b53WXmm7FCBJlDM48JL2zhC535hsWOQEkqtyCkZsSbnaU3zHjHqj6XHxcdbHz/bzDSBST90Nc465IzrnbB6vs1ypr9jz0l7yU3A X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(346002)(376002)(136003)(396003)(39860400002)(46966006)(36840700001)(36906005)(70586007)(7636003)(186003)(356005)(54906003)(47076005)(5660300002)(36756003)(36860700001)(86362001)(6916009)(16526019)(316002)(82740400003)(4326008)(2906002)(6666004)(82310400003)(8936002)(336012)(70206006)(83380400001)(426003)(26005)(8676002)(34020700004)(107886003)(478600001)(2616005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:00.8144 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e3cc282b-84f1-44db-936f-08d8e4b80c9d X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT065.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4835 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The cited function currently uses rtnl_dereference() to get nh_info from a handed-in nexthop. However, under the resilient hashing scheme, this function will not always be called under RTNL, sometimes the mutual exclusion will be achieved differently. Therefore move the nh_info extraction from the function to its callers to make it possible to use a different synchronization guarantee. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: David Ahern --- net/ipv4/nexthop.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index f723dc97dcd3..69c8b50a936e 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -52,10 +52,8 @@ static bool nexthop_notifiers_is_empty(struct net *net) static void __nh_notifier_single_info_init(struct nh_notifier_single_info *nh_info, - const struct nexthop *nh) + const struct nh_info *nhi) { - struct nh_info *nhi = rtnl_dereference(nh->nh_info); - nh_info->dev = nhi->fib_nhc.nhc_dev; nh_info->gw_family = nhi->fib_nhc.nhc_gw_family; if (nh_info->gw_family == AF_INET) @@ -71,12 +69,14 @@ __nh_notifier_single_info_init(struct nh_notifier_single_info *nh_info, static int nh_notifier_single_info_init(struct nh_notifier_info *info, const struct nexthop *nh) { + struct nh_info *nhi = rtnl_dereference(nh->nh_info); + info->type = NH_NOTIFIER_INFO_TYPE_SINGLE; info->nh = kzalloc(sizeof(*info->nh), GFP_KERNEL); if (!info->nh) return -ENOMEM; - __nh_notifier_single_info_init(info->nh, nh); + __nh_notifier_single_info_init(info->nh, nhi); return 0; } @@ -103,11 +103,13 @@ static int nh_notifier_mp_info_init(struct nh_notifier_info *info, for (i = 0; i < num_nh; i++) { struct nh_grp_entry *nhge = &nhg->nh_entries[i]; + struct nh_info *nhi; + nhi = rtnl_dereference(nhge->nh->nh_info); info->nh_grp->nh_entries[i].id = nhge->nh->id; info->nh_grp->nh_entries[i].weight = nhge->weight; __nh_notifier_single_info_init(&info->nh_grp->nh_entries[i].nh, - nhge->nh); + nhi); } return 0; From patchwork Thu Mar 11 18:03:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 399480 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E134C433E6 for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 26ADC64FE9 for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230162AbhCKSEj (ORCPT ); Thu, 11 Mar 2021 13:04:39 -0500 Received: from mail-dm6nam11on2084.outbound.protection.outlook.com ([40.107.223.84]:18400 "EHLO NAM11-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229674AbhCKSEF (ORCPT ); Thu, 11 Mar 2021 13:04:05 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KlMR1unKlrbqD3/R45NgRN8vqfqpKE//y0OIBt9zdyGyDqsX7QkXiEvKR5wXJeHlboj4gPM1svKJLLBSqZYfpdDkIuBtlS6cd9WA1xNoXGapyzFuMaQ1iKOap9vcRWDXE4kI5bzgB6CgwuxacoWIKFqxjBY1dXod5fzZATSkcWQJj1du760aGQYsn1d1KyuY17ZJwmPL0zFDpwsr3FRRiI9WDhfubgiF/7caRiVcGEsyxMk6WA4Ilf+eIzqtppsVXSopz0gKu0T6fWMOWmRhWBMwj7blhdabCdeGlza+5yweFYq0UqrGUw85MeHQmrnAkEq/QHuUbDWPPFBLYGEQMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=oBjImWLXDDIRY8y3EUszXdNwrl8TSJkH7Tu+9IKud+E=; b=fJNzjXK1fHXyZrwkQCRajMZK4qSa4xvVexRLoiy/WSe+wf/0igEYR3mDdBshNY+5j/fV7x7yF+YCoswZKuB/7bRwtZhH8UgheegqhBvBcRpHugPQaH2vE7eeRm3PYdGqpgAyFkTlOTLVrbUt/gZEI/Skc2rlVKeMp624P0WaDFen/OwGwLmK4RZhuwF/NY4HWtnlJaC/irMkY/+QayMNhlCKPU0SYcC3cKv7o0NKQfBTwpj4PfmhFTwIZtLjWn380i2x+/hk5j5bYfTh8kK0mBL09/hQ0BQ5EUukMTljSwZrwrjr1RtFQ1EmNf6gEiaySCIc1Lob0kRKrmgOsTu/MQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=oBjImWLXDDIRY8y3EUszXdNwrl8TSJkH7Tu+9IKud+E=; b=EBj3DoPcgrkqUVbh60FskKv1bNaMQMy+QVemF48jdxO5c4iJpqmTH+iqbfSrvl0KXvTHumCDa6UBPMZs0n7mBc0gglvghmgLLsg8IH/ntnqsBZNFRM+ZJUkCy+z4bcQYl9HbkfBvJiIc1+NeDZvRz2UpKIoytx5dAOtTN82g8un6OgUwhcVTRsFtQwGcYD7AU0hJY4cMkY6GhXA6CTStJKJ8rWSohl44QkbQq6XnC+QijkSpt7SA1QxTwcVf8Iy8yoPeBej9dT9yISlq4RmVFRbbrrxYdWnUiEe6iMo+sFsa/gtwSqpwV5o+p8/iSTQvMw0CLYOj5oR5pphsFguFlQ== Received: from BN8PR03CA0019.namprd03.prod.outlook.com (2603:10b6:408:94::32) by BN6PR1201MB0131.namprd12.prod.outlook.com (2603:10b6:405:5b::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.28; Thu, 11 Mar 2021 18:04:04 +0000 Received: from BN8NAM11FT034.eop-nam11.prod.protection.outlook.com (2603:10b6:408:94:cafe::7b) by BN8PR03CA0019.outlook.office365.com (2603:10b6:408:94::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17 via Frontend Transport; Thu, 11 Mar 2021 18:04:04 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT034.mail.protection.outlook.com (10.13.176.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:03 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:00 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 03/14] nexthop: Add a dedicated flag for multipath next-hop groups Date: Thu, 11 Mar 2021 19:03:14 +0100 Message-ID: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 09495602-ba87-4c32-ac9a-08d8e4b80e65 X-MS-TrafficTypeDiagnostic: BN6PR1201MB0131: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:7219; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: JQl6teavCiGSVVi8eRIXUFiJcNkvESHQX0P9vwTewPm5Vwo9xwSaWGtBhtV787554UdXxU6nkPtPiRGZlQVWGVLQKyNdf3d3tEHeTMc9pAGkc/vUZ5OCB65pJxXm/LrJT7XLOVtClbeuC8hUqKEyt/d1kLoPwTo0fmu1yx3xyjj5tbalp87sm4xilapPAiF0y7IzlCY6wLMieXWC5GTmUKVBK7cty87qu5Sj+bMDa2YahGoUchqRfXroC4TENdfaE/fM1n6atqYwanelrahCulk/WUgLKw3/HGpBH5tGH7cr9kTPXwWhSC7dc7C2vU5GR2VkZwbb0nzBsmLarVGVBXPMueiXg0AxmJ+oKKoIBSKwxDgPPAr8ui5if2Dhtyji5SwrHqI/2Rnf0ND2OSGkTrC5s6lVWmg7lvNHAaUw/6vJhf8PFRewfrsBLIeKV7AZVRWHbh8qKr7iG4piBYSibgfeyXQhTlKeu+z644vXmNrUkVt6A4KG/id/C1MZrC7YjipDMNCKpDdPNMWZCcZwAXEJtPg040VO62EZsPqE4j8vwiiXFQ/yWeX1Atg89HJyy6j/5tL42oYBiLqtyhviYMNLxmI5kKo5cImctXsMNc8mExaxbZGV6h9+yn/nQi6xoWEOqh+aG0ItpwBAGwsWDxWwLOk6A6zV9tZw5aN9XamskDIAXICuarQdiSP0ixvE X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(136003)(39860400002)(346002)(376002)(396003)(46966006)(36840700001)(47076005)(478600001)(36756003)(36860700001)(26005)(107886003)(8936002)(316002)(8676002)(6666004)(5660300002)(54906003)(36906005)(4326008)(16526019)(186003)(82310400003)(6916009)(7636003)(2616005)(86362001)(356005)(34020700004)(2906002)(426003)(70586007)(70206006)(336012)(83380400001)(82740400003); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:03.8075 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 09495602-ba87-4c32-ac9a-08d8e4b80e65 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT034.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR1201MB0131 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org With the introduction of resilient nexthop groups, there will be two types of multipath groups: the current hash-threshold "mpath" ones, and resilient groups. Both are multipath, but to determine the fact, the system needs to consider two flags. This might prove costly in the datapath. Therefore, introduce a new flag, that should be set for next-hop groups that have more than one nexthop, and should be considered multipath. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: David Ahern --- Notes: v1 (changes since RFC): - This patch is new include/net/nexthop.h | 7 ++++--- net/ipv4/nexthop.c | 5 ++++- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/net/nexthop.h b/include/net/nexthop.h index 7bc057aee40b..5062c2c08e2b 100644 --- a/include/net/nexthop.h +++ b/include/net/nexthop.h @@ -80,6 +80,7 @@ struct nh_grp_entry { struct nh_group { struct nh_group *spare; /* spare group for removals */ u16 num_nh; + bool is_multipath; bool mpath; bool fdb_nh; bool has_v4; @@ -212,7 +213,7 @@ static inline bool nexthop_is_multipath(const struct nexthop *nh) struct nh_group *nh_grp; nh_grp = rcu_dereference_rtnl(nh->nh_grp); - return nh_grp->mpath; + return nh_grp->is_multipath; } return false; } @@ -227,7 +228,7 @@ static inline unsigned int nexthop_num_path(const struct nexthop *nh) struct nh_group *nh_grp; nh_grp = rcu_dereference_rtnl(nh->nh_grp); - if (nh_grp->mpath) + if (nh_grp->is_multipath) rc = nh_grp->num_nh; } @@ -308,7 +309,7 @@ struct fib_nh_common *nexthop_fib_nhc(struct nexthop *nh, int nhsel) struct nh_group *nh_grp; nh_grp = rcu_dereference_rtnl(nh->nh_grp); - if (nh_grp->mpath) { + if (nh_grp->is_multipath) { nh = nexthop_mpath_select(nh_grp, nhsel); if (!nh) return NULL; diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 69c8b50a936e..56c54d0fbacc 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -967,6 +967,7 @@ static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge, } newg->has_v4 = false; + newg->is_multipath = nhg->is_multipath; newg->mpath = nhg->mpath; newg->fdb_nh = nhg->fdb_nh; newg->num_nh = nhg->num_nh; @@ -1488,8 +1489,10 @@ static struct nexthop *nexthop_create_group(struct net *net, nhg->nh_entries[i].nh_parent = nh; } - if (cfg->nh_grp_type == NEXTHOP_GRP_TYPE_MPATH) + if (cfg->nh_grp_type == NEXTHOP_GRP_TYPE_MPATH) { nhg->mpath = 1; + nhg->is_multipath = true; + } WARN_ON_ONCE(nhg->mpath != 1); From patchwork Thu Mar 11 18:03:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 398182 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82405C433E0 for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4676C64F94 for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230187AbhCKSEk (ORCPT ); Thu, 11 Mar 2021 13:04:40 -0500 Received: from mail-mw2nam12on2040.outbound.protection.outlook.com ([40.107.244.40]:34785 "EHLO NAM12-MW2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229887AbhCKSEJ (ORCPT ); Thu, 11 Mar 2021 13:04:09 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eNaFsmOsABsXkqr2JK9JJs8taceP/JvKvS4xmYNTEs9GrI3cfgoh+bnGZpsZKgSieb07MkeRfp3opIlXk+TRPrzj/tbFNVz1vctW7Jkrn/d+oVBYcOc1hiZKm9wq4ViwfLRhbnWzuEP5rLW0LxrFGlKD6vjJNvz968pioMWkBHdw5vQ4yrpD4coEpF1W43gINZscclt30p/9syOUyM1ax1NohMrdmdBl4AA1Ftub/U0N13dBmrDOWYn+fiKchqZ51aeb29R/oOzaiNBCY+OE43CDxMjyvSCAe1meDjeJqh3ChLPuweT0bO/dgGfHQ96Ep77hVo4WjtnFwVzB4J3xlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IgiJoOC+hcOwF5f1OH8YEPGLoduTP1S0FhQdB0tHBF4=; b=SmWfdh0ImHCQilHbG/x67UVlzxCJfNIwfZRFIWqg2ScHve+JoUym7ec/y9zHkbCgveTd81Nl6BkjSbsV3CxOLv5sPHqWVIJLuwcDu8TDHX9wI97CwwfzB7LlfZX8TFBB0jUov6zTwrAIl9AAFBK5PCyHtHW7Rz1KqFIO3zFJjhR3+Vv1U7fpZz65ZLpQwVIapP4exLkn7fOxB6LEr19Q8ZRjBDEwGEc9eclCZTgZ3dFKlJbiFhSO1bOyCTw6kmzLIKqI0+L72f8qopUZXLdezywh5Uz9J5zv3MxDBx/numQhrw6v0iLgh7xSoUeBbqBHK6Z+otbcfOY8upFueZS/Cg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IgiJoOC+hcOwF5f1OH8YEPGLoduTP1S0FhQdB0tHBF4=; b=t9A9jVaGb/kL3AKl2boenZB6FUI3zEXa5Is4g45usqO7+etHyWg8PNMUnxrWlmSNsMXqQNswxdvoIsovLi2zoURZT69CM/1FGOUex7y2PbzNEuUBgq5hxyL7jDvPwsLd4Yw/l3xHoT3sBzf2eQ08K7DBD3Gy9pfX01VGoz5l9VsSGslX/rx+q6BfpmhmqvZMWN5MYQd0UBHzs9Gc2duEjk16U+XOvbqG316OYDw31aUzXbUlqKT7rx0FTEuPltSDJ0qJ0ynwTuohG3Ugghs4Q65ywY8iuxtIujEtgVdy11LFCRp78VZ/vPvPrKhfVCvfNt1ZMHqKpqVjMst7wBUwKg== Received: from BN0PR03CA0025.namprd03.prod.outlook.com (2603:10b6:408:e6::30) by BY5PR12MB4002.namprd12.prod.outlook.com (2603:10b6:a03:1ad::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17; Thu, 11 Mar 2021 18:04:07 +0000 Received: from BN8NAM11FT051.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e6:cafe::f5) by BN0PR03CA0025.outlook.office365.com (2603:10b6:408:e6::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17 via Frontend Transport; Thu, 11 Mar 2021 18:04:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT051.mail.protection.outlook.com (10.13.177.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:06 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:03 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 04/14] nexthop: Add netlink defines and enumerators for resilient NH groups Date: Thu, 11 Mar 2021 19:03:15 +0100 Message-ID: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 02b81a72-22f7-4937-0789-08d8e4b81028 X-MS-TrafficTypeDiagnostic: BY5PR12MB4002: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:7691; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: sGGWQXsTdQ5jADSJJBsgEeHb4OowlimhBjrt9cdxnmWbl2GfsPCxmnZIF+dSVgHtecVdxCHOKCjEzEciXuZb0BAXpUHWneO5/BM7vvrwPCvKaNsYpzwd5igxswSo/XhypTTPHQSJqfBOjgFtTx2WLbtNbVzx4OfDesphf2XsXdTKV8uIlrERYyL6Ho0XK/eXc9DoyjIHkLHoydQYq1saHaSeOeFb4Jtwo+EojyqsORGf/kPV8vvvXpetaIhbeR4d+rpo8Zld9rWY19g7XjJMTMyxU9zj2d9JGfCqHMK0dQBci4OR7M14zg76LzZxQp2sUy9fI9b4hgBQTT0o0F5+RkCSBkOircgttEjp/J2LqKRJGaG2LDPtRQTpTwNjuTu7Ptq1nI0ezfNOQQYBMapdH5c6+sUmwlTACPRGETnayQx+SQcbKEBxFByWSnnGxXWuikmEa4wQWa4+EVfuzYxt3/8GiReQeSd8MMBRhd3HgMCDUfcMnHQcAh9gWsmbYxBk7pqIpS6woNmhDgGBRFsHwGLYzZR9QxFsrBJssS+5sVUBcnnoXxIIMFcvXOG4B0OKEubY22sZ4YvmiVvYVdproCsVfIEEcJWLDV+Dt887Ep5tfcKE15okaoqurOW91IJITTzM/TG3BrWb1GPLFZR5A1cdE5dTHwsDYNxUnhRVZ4Z5elS/g7FeXsbwJ/vThLX8 X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(396003)(136003)(39860400002)(376002)(346002)(36840700001)(46966006)(186003)(426003)(16526019)(6916009)(478600001)(36756003)(47076005)(82740400003)(5660300002)(336012)(82310400003)(36860700001)(7636003)(356005)(54906003)(70206006)(26005)(316002)(70586007)(8936002)(107886003)(8676002)(4326008)(83380400001)(2616005)(2906002)(6666004)(34020700004)(86362001)(36906005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:06.7510 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 02b81a72-22f7-4937-0789-08d8e4b81028 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT051.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4002 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ido Schimmel - RTM_NEWNEXTHOP et.al. that handle resilient groups will have a new nested attribute, NHA_RES_GROUP, whose elements are attributes NHA_RES_GROUP_*. - RTM_NEWNEXTHOPBUCKET et.al. is a suite of new messages that will currently serve only for dumping of individual buckets of resilient next hop groups. For nexthop group buckets, these messages will carry a nested attribute NHA_RES_BUCKET, whose elements are attributes NHA_RES_BUCKET_*. There are several reasons why a new suite of messages is created for nexthop buckets instead of overloading the information on the existing RTM_{NEW,DEL,GET}NEXTHOP messages. First, a nexthop group can contain a large number of nexthop buckets (4k is not unheard of). This imposes limits on the amount of information that can be encoded for each nexthop bucket given a netlink message is limited to 64k bytes. Second, while RTM_NEWNEXTHOPBUCKET is only used for notifications at this point, in the future it can be extended to provide user space with control over nexthop buckets configuration. - The new group type is NEXTHOP_GRP_TYPE_RES. Note that nexthop code is adjusted to bounce groups with that type for now. Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata Reviewed-by: David Ahern Signed-off-by: Petr Machata --- Notes: v2: - Comment at NEXTHOP_GRP_TYPE_MPATH that it's for the hash-threshold groups. v1 (changes since RFC): - u32 -> u16 for bucket counts / indices include/uapi/linux/nexthop.h | 47 +++++++++++++++++++++++++++++++++- include/uapi/linux/rtnetlink.h | 7 +++++ net/ipv4/nexthop.c | 2 ++ security/selinux/nlmsgtab.c | 5 +++- 4 files changed, 59 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/nexthop.h b/include/uapi/linux/nexthop.h index 2d4a1e784cf0..d8ffa8c9ca78 100644 --- a/include/uapi/linux/nexthop.h +++ b/include/uapi/linux/nexthop.h @@ -21,7 +21,10 @@ struct nexthop_grp { }; enum { - NEXTHOP_GRP_TYPE_MPATH, /* default type if not specified */ + NEXTHOP_GRP_TYPE_MPATH, /* hash-threshold nexthop group + * default type if not specified + */ + NEXTHOP_GRP_TYPE_RES, /* resilient nexthop group */ __NEXTHOP_GRP_TYPE_MAX, }; @@ -52,8 +55,50 @@ enum { NHA_FDB, /* flag; nexthop belongs to a bridge fdb */ /* if NHA_FDB is added, OIF, BLACKHOLE, ENCAP cannot be set */ + /* nested; resilient nexthop group attributes */ + NHA_RES_GROUP, + /* nested; nexthop bucket attributes */ + NHA_RES_BUCKET, + __NHA_MAX, }; #define NHA_MAX (__NHA_MAX - 1) + +enum { + NHA_RES_GROUP_UNSPEC, + /* Pad attribute for 64-bit alignment. */ + NHA_RES_GROUP_PAD = NHA_RES_GROUP_UNSPEC, + + /* u16; number of nexthop buckets in a resilient nexthop group */ + NHA_RES_GROUP_BUCKETS, + /* clock_t as u32; nexthop bucket idle timer (per-group) */ + NHA_RES_GROUP_IDLE_TIMER, + /* clock_t as u32; nexthop unbalanced timer */ + NHA_RES_GROUP_UNBALANCED_TIMER, + /* clock_t as u64; nexthop unbalanced time */ + NHA_RES_GROUP_UNBALANCED_TIME, + + __NHA_RES_GROUP_MAX, +}; + +#define NHA_RES_GROUP_MAX (__NHA_RES_GROUP_MAX - 1) + +enum { + NHA_RES_BUCKET_UNSPEC, + /* Pad attribute for 64-bit alignment. */ + NHA_RES_BUCKET_PAD = NHA_RES_BUCKET_UNSPEC, + + /* u16; nexthop bucket index */ + NHA_RES_BUCKET_INDEX, + /* clock_t as u64; nexthop bucket idle time */ + NHA_RES_BUCKET_IDLE_TIME, + /* u32; nexthop id assigned to the nexthop bucket */ + NHA_RES_BUCKET_NH_ID, + + __NHA_RES_BUCKET_MAX, +}; + +#define NHA_RES_BUCKET_MAX (__NHA_RES_BUCKET_MAX - 1) + #endif diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 91e4ca064d61..d35953bc7d53 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -178,6 +178,13 @@ enum { RTM_GETVLAN, #define RTM_GETVLAN RTM_GETVLAN + RTM_NEWNEXTHOPBUCKET = 116, +#define RTM_NEWNEXTHOPBUCKET RTM_NEWNEXTHOPBUCKET + RTM_DELNEXTHOPBUCKET, +#define RTM_DELNEXTHOPBUCKET RTM_DELNEXTHOPBUCKET + RTM_GETNEXTHOPBUCKET, +#define RTM_GETNEXTHOPBUCKET RTM_GETNEXTHOPBUCKET + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) }; diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 56c54d0fbacc..7a94591da856 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -1492,6 +1492,8 @@ static struct nexthop *nexthop_create_group(struct net *net, if (cfg->nh_grp_type == NEXTHOP_GRP_TYPE_MPATH) { nhg->mpath = 1; nhg->is_multipath = true; + } else if (cfg->nh_grp_type == NEXTHOP_GRP_TYPE_RES) { + goto out_no_nh; } WARN_ON_ONCE(nhg->mpath != 1); diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c index b69231918686..d59276f48d4f 100644 --- a/security/selinux/nlmsgtab.c +++ b/security/selinux/nlmsgtab.c @@ -88,6 +88,9 @@ static const struct nlmsg_perm nlmsg_route_perms[] = { RTM_NEWVLAN, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_DELVLAN, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_GETVLAN, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_NEWNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, + { RTM_DELNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, + { RTM_GETNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_READ }, }; static const struct nlmsg_perm nlmsg_tcpdiag_perms[] = @@ -171,7 +174,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm) * structures at the top of this file with the new mappings * before updating the BUILD_BUG_ON() macro! */ - BUILD_BUG_ON(RTM_MAX != (RTM_NEWVLAN + 3)); + BUILD_BUG_ON(RTM_MAX != (RTM_NEWNEXTHOPBUCKET + 3)); err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms, sizeof(nlmsg_route_perms)); break; From patchwork Thu Mar 11 18:03:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 398181 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6133C433E9 for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5CAA864FFA for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230212AbhCKSEl (ORCPT ); Thu, 11 Mar 2021 13:04:41 -0500 Received: from mail-bn8nam11on2074.outbound.protection.outlook.com ([40.107.236.74]:9857 "EHLO NAM11-BN8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229921AbhCKSEP (ORCPT ); Thu, 11 Mar 2021 13:04:15 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ToWQE21ao52mG1HztRFi+rk9IztVxw8/63GQKby2vBrBm5CIWQA5PQe1P22CCyE7PT65Zt0Ubg1C2xGD8DhQCBbtYqYCcObWpDo/6NBh4exhyLtIPwDaT1AtpC3jSyl8+s2ZqSdHwT8dEE+FIYnbwHnFSJ7lh2sA4Yt7ws8SjidBYAjwiCfFTTaZezXd4bY/XuxDHZy0dznEwpRfEzi+WS4+58SROL5Y8//uNTFT+nIq9DJhmnzCotPhdsTwuHRzNLIQY+7UOYuVVLpjjAuqDZC8vgkbB0U6BsMc9rbWl5RZBv4Ol8m0wPWeGZVU05/+b5Ootl/3o2ZmpBRptMjB4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5bhXiwoYOEiyOAhDhWPto7M0xjhVBq6h6OvWNhNl1hI=; b=CjUZ35wwbSZSbM267ggIuqCQDzXwg4qMEHBxoGlon3NrBuy9LptuGHY4A3PwaXs73v2PBh2e7L+azYB+sjOR2jheLwbnZ4O/nb+M72s9SI3pUN/iFEFKo8PiJYiuP8fo/DPq/ywgwQ0CVdgBS+jld/VCLM6Onprj4iFoHWDmRr1R1tkI/3gIZiIFL4lo4t7MQpM36LEAyC4U8g3OmDh6HCzdUNHOU3luZozRurSb5r+6gAWNOM9bbSpJGS4xUldTycOoOsa4pX3dPZltF9/lFLCZ/JiNoZ95seeQi5mk9dPu40wSHSIdfwOjJ6py/PX4XqmwEfOcICKIzhCNBF1Kdw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5bhXiwoYOEiyOAhDhWPto7M0xjhVBq6h6OvWNhNl1hI=; b=N/ytHROM+HIbRbvCJBd/BYsguiidpPqlDuYImMVaEGlKG2I6DYYdzWXuskMLe51BT4uruk0NOUrhpS5fqWXSte466cnx8/iuGNQBDeIcrPrkia0K4RTH6cSIkKk4Xypg3zMeoUZ6nw6k0jLR5Ldb4pgjM9bEEjTwVMnw0bO8pPm58eaMLXccwtV6asevg5Q1chG/8qIjvcjZ1l3MuYpjh8hIP9URfXMCyIIWZjqqtbDbSAtxQcuks3mlCFj6uiKbgKQz6S1MO2P0i5oN2sOlqb7q/xScd48O14BQVQ921jeY/PcpgD8yUBtuS1cH6mCb4/p+HPV05qVxy7KH1R9a5Q== Received: from BN6PR13CA0036.namprd13.prod.outlook.com (2603:10b6:404:13e::22) by BN6PR1201MB0195.namprd12.prod.outlook.com (2603:10b6:405:53::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3933.31; Thu, 11 Mar 2021 18:04:10 +0000 Received: from BN8NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:404:13e:cafe::85) by BN6PR13CA0036.outlook.office365.com (2603:10b6:404:13e::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3955.10 via Frontend Transport; Thu, 11 Mar 2021 18:04:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT060.mail.protection.outlook.com (10.13.177.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:09 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:06 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 05/14] nexthop: Add implementation of resilient next-hop groups Date: Thu, 11 Mar 2021 19:03:16 +0100 Message-ID: <5c825c1efed976ad3cd036256d8ca3232de9a294.1615485052.git.petrm@nvidia.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c155ec2d-0431-4856-65ca-08d8e4b81200 X-MS-TrafficTypeDiagnostic: BN6PR1201MB0195: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: QRRcty7JP/Kry2DnKDLkkZq9fr5GcvAshKjnebF4PWSwGz+qxqFnds+2ItLwU004DRX/29jO0DxlObwtJZ9pqWaEA1es56Ju4ICsAsNrR1n/u8NzLaB7spLlMcMBmhbVKEtvK+Ot3MCPzqvdFg23391ECySQFgtgk91VsI2h8DAz9DBno6psSYNMFgaor4qpTv4fKNWTzMpi6whYjqtY60bs5MIx7N+ZA3OZ8F/40TRRN42Zp90Wtb+5P04KtIn+Oi+zTZL9dcVKe7rLX8T5iC+bvMrELbd5ocTDwtFqyceabYS+02xVt6NIFUrDH+UOyHdsJQ1Sf1MFzKWmsv2wAZkroORa7RBw1BZP6CzlK2BKam9H/mWfN9NbItgs9ncSx6AFCf/ZXhKJdI8nkoXRuaVW5IqBc/0sRtxx4/VULo3a7fX4I1UX7RNstErRMEWBdaGe+5L7O1Silr1wKYmR0bLefeJDhQIkVnO3jIYWCtLvEHRNO97ZwxWyVnVniP6l5ZpZ7WYFX4Ql2KiwDkmB1DvROopttVEA/p9SrTGFWYGt3d+qXrQK72xG8s/JlkwUIs6PRtxfy+ILDaKhYVHFxn8+pwcrWKT3eMU5PptGcXFtDAZ4KYC62gzwGjDK0qCx2VtBbN2dNkRd7mxydiA+tKddpbNfM+RL45AzUjMpGEBfiu/iBNTFFx1NdOfFSzV7 X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(39860400002)(136003)(376002)(346002)(396003)(36840700001)(46966006)(6916009)(82740400003)(356005)(47076005)(36756003)(30864003)(8936002)(26005)(107886003)(2906002)(6666004)(36860700001)(86362001)(7636003)(186003)(83380400001)(16526019)(336012)(70586007)(36906005)(8676002)(4326008)(5660300002)(478600001)(426003)(2616005)(54906003)(34020700004)(316002)(82310400003)(70206006); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:09.8397 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c155ec2d-0431-4856-65ca-08d8e4b81200 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR1201MB0195 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org At this moment, there is only one type of next-hop group: an mpath group, which implements the hash-threshold algorithm. To select a next hop, hash-threshold algorithm first assigns a range of hashes to each next hop in the group, and then selects the next hop by comparing the SKB hash with the individual ranges. When a next hop is removed from the group, the ranges are recomputed, which leads to reassignment of parts of hash space from one next hop to another. While there will usually be some overlap between the previous and the new distribution, some traffic flows change the next hop that they resolve to. That causes problems e.g. as established TCP connections are reset, because the traffic is forwarded to a server that is not familiar with the connection. Resilient hashing is a technique to address the above problem. Resilient next-hop group has another layer of indirection between the group itself and its constituent next hops: a hash table. The selection algorithm uses a straightforward modulo operation to choose a hash bucket, and then reads the next hop that this bucket contains, and forwards traffic there. This indirection brings an important feature. In the hash-threshold algorithm, the range of hashes associated with a next hop must be continuous. With a hash table, mapping between the hash table buckets and the individual next hops is arbitrary. Therefore when a next hop is deleted the buckets that held it are simply reassigned to other next hops. When weights of next hops in a group are altered, it may be possible to choose a subset of buckets that are currently not used for forwarding traffic, and use those to satisfy the new next-hop distribution demands, keeping the "busy" buckets intact. This way, established flows are ideally kept being forwarded to the same endpoints through the same paths as before the next-hop group change. In a nutshell, the algorithm works as follows. Each next hop has a number of buckets that it wants to have, according to its weight and the number of buckets in the hash table. In case of an event that might cause bucket allocation change, the numbers for individual next hops are updated, similarly to how ranges are updated for mpath group next hops. Following that, a new "upkeep" algorithm runs, and for idle buckets that belong to a next hop that is currently occupying more buckets than it wants (it is "overweight"), it migrates the buckets to one of the next hops that has fewer buckets than it wants (it is "underweight"). If, after this, there are still underweight next hops, another upkeep run is scheduled to a future time. Chances are there are not enough "idle" buckets to satisfy the new demands. The algorithm has knobs to select both what it means for a bucket to be idle, and for whether and when to forcefully migrate buckets if there keeps being an insufficient number of idle buckets. There are three users of the resilient data structures. - The forwarding code accesses them under RCU, and does not modify them except for updating the time a selected bucket was last used. - Netlink code, running under RTNL, which may modify the data. - The delayed upkeep code, which may modify the data. This runs unlocked, and mutual exclusion between the RTNL code and the delayed upkeep is maintained by canceling the delayed work synchronously before the RTNL code touches anything. Later it restarts the delayed work if necessary. The RTNL code has to implement next-hop group replacement, next hop removal, etc. For removal, the mpath code uses a neat trick of having a backup next hop group structure, doing the necessary changes offline, and then RCU-swapping them in. However, the hash tables for resilient hashing are about an order of magnitude larger than the groups themselves (the size might be e.g. 4K entries), and it was felt that keeping two of them is an overkill. Both the primary next-hop group and the spare therefore use the same resilient table, and writers are careful to keep all references valid for the forwarding code. The hash table references next-hop group entries from the next-hop group that is currently in the primary role (i.e. not spare). During the transition from primary to spare, the table references a mix of both the primary group and the spare. When a next hop is deleted, the corresponding buckets are not set to NULL, but instead marked as empty, so that the pointer is valid and can be used by the forwarding code. The buckets are then migrated to a new next-hop group entry during upkeep. The only times that the hash table is invalid is the very beginning and very end of its lifetime. Between those points, it is always kept valid. This patch introduces the core support code itself. It does not handle notifications towards drivers, which are kept as if the group were an mpath one. It does not handle netlink either. The only bit currently exposed to user space is the new next-hop group type, and that is currently bounced. There is therefore no way to actually access this code. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: David Ahern --- Notes: v1 (changes since RFC): - u32 -> u16 for bucket counts / indices - set the new flag is_multipath for resilient groups include/net/nexthop.h | 42 ++++ net/ipv4/nexthop.c | 517 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 546 insertions(+), 13 deletions(-) diff --git a/include/net/nexthop.h b/include/net/nexthop.h index 5062c2c08e2b..b78505c9031e 100644 --- a/include/net/nexthop.h +++ b/include/net/nexthop.h @@ -40,6 +40,12 @@ struct nh_config { struct nlattr *nh_grp; u16 nh_grp_type; + u16 nh_grp_res_num_buckets; + unsigned long nh_grp_res_idle_timer; + unsigned long nh_grp_res_unbalanced_timer; + bool nh_grp_res_has_num_buckets; + bool nh_grp_res_has_idle_timer; + bool nh_grp_res_has_unbalanced_timer; struct nlattr *nh_encap; u16 nh_encap_type; @@ -63,6 +69,32 @@ struct nh_info { }; }; +struct nh_res_bucket { + struct nh_grp_entry __rcu *nh_entry; + atomic_long_t used_time; + unsigned long migrated_time; + bool occupied; + u8 nh_flags; +}; + +struct nh_res_table { + struct net *net; + u32 nhg_id; + struct delayed_work upkeep_dw; + + /* List of NHGEs that have too few buckets ("uw" for underweight). + * Reclaimed buckets will be given to entries in this list. + */ + struct list_head uw_nh_entries; + unsigned long unbalanced_since; + + u32 idle_timer; + u32 unbalanced_timer; + + u16 num_nh_buckets; + struct nh_res_bucket nh_buckets[]; +}; + struct nh_grp_entry { struct nexthop *nh; u8 weight; @@ -71,6 +103,13 @@ struct nh_grp_entry { struct { atomic_t upper_bound; } mpath; + struct { + /* Member on uw_nh_entries. */ + struct list_head uw_nh_entry; + + u16 count_buckets; + u16 wants_buckets; + } res; }; struct list_head nh_list; @@ -82,8 +121,11 @@ struct nh_group { u16 num_nh; bool is_multipath; bool mpath; + bool resilient; bool fdb_nh; bool has_v4; + + struct nh_res_table __rcu *res_table; struct nh_grp_entry nh_entries[]; }; diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 7a94591da856..0e2ff72e10c0 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -183,6 +183,30 @@ static int call_nexthop_notifiers(struct net *net, return notifier_to_errno(err); } +/* There are three users of RES_TABLE, and NHs etc. referenced from there: + * + * 1) a collection of callbacks for NH maintenance. This operates under + * RTNL, + * 2) the delayed work that gradually balances the resilient table, + * 3) and nexthop_select_path(), operating under RCU. + * + * Both the delayed work and the RTNL block are writers, and need to + * maintain mutual exclusion. Since there are only two and well-known + * writers for each table, the RTNL code can make sure it has exclusive + * access thus: + * + * - Have the DW operate without locking; + * - synchronously cancel the DW; + * - do the writing; + * - if the write was not actually a delete, call upkeep, which schedules + * DW again if necessary. + * + * The functions that are always called from the RTNL context use + * rtnl_dereference(). The functions that can also be called from the DW do + * a raw dereference and rely on the above mutual exclusion scheme. + */ +#define nh_res_dereference(p) (rcu_dereference_raw(p)) + static int call_nexthop_notifier(struct notifier_block *nb, struct net *net, enum nexthop_event_type event_type, struct nexthop *nh, @@ -241,6 +265,9 @@ static void nexthop_free_group(struct nexthop *nh) WARN_ON(nhg->spare == nhg); + if (nhg->resilient) + vfree(rcu_dereference_raw(nhg->res_table)); + kfree(nhg->spare); kfree(nhg); } @@ -299,6 +326,30 @@ static struct nh_group *nexthop_grp_alloc(u16 num_nh) return nhg; } +static void nh_res_table_upkeep_dw(struct work_struct *work); + +static struct nh_res_table * +nexthop_res_table_alloc(struct net *net, u32 nhg_id, struct nh_config *cfg) +{ + const u16 num_nh_buckets = cfg->nh_grp_res_num_buckets; + struct nh_res_table *res_table; + unsigned long size; + + size = struct_size(res_table, nh_buckets, num_nh_buckets); + res_table = __vmalloc(size, GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN); + if (!res_table) + return NULL; + + res_table->net = net; + res_table->nhg_id = nhg_id; + INIT_DELAYED_WORK(&res_table->upkeep_dw, &nh_res_table_upkeep_dw); + INIT_LIST_HEAD(&res_table->uw_nh_entries); + res_table->idle_timer = cfg->nh_grp_res_idle_timer; + res_table->unbalanced_timer = cfg->nh_grp_res_unbalanced_timer; + res_table->num_nh_buckets = num_nh_buckets; + return res_table; +} + static void nh_base_seq_inc(struct net *net) { while (++net->nexthop.seq == 0) @@ -347,6 +398,13 @@ static u32 nh_find_unused_id(struct net *net) return 0; } +static void nh_res_time_set_deadline(unsigned long next_time, + unsigned long *deadline) +{ + if (time_before(next_time, *deadline)) + *deadline = next_time; +} + static int nla_put_nh_group(struct sk_buff *skb, struct nh_group *nhg) { struct nexthop_grp *p; @@ -540,20 +598,62 @@ static void nexthop_notify(int event, struct nexthop *nh, struct nl_info *info) rtnl_set_sk_err(info->nl_net, RTNLGRP_NEXTHOP, err); } +static unsigned long nh_res_bucket_used_time(const struct nh_res_bucket *bucket) +{ + return (unsigned long)atomic_long_read(&bucket->used_time); +} + +static unsigned long +nh_res_bucket_idle_point(const struct nh_res_table *res_table, + const struct nh_res_bucket *bucket, + unsigned long now) +{ + unsigned long time = nh_res_bucket_used_time(bucket); + + /* Bucket was not used since it was migrated. The idle time is now. */ + if (time == bucket->migrated_time) + return now; + + return time + res_table->idle_timer; +} + +static unsigned long +nh_res_table_unb_point(const struct nh_res_table *res_table) +{ + return res_table->unbalanced_since + res_table->unbalanced_timer; +} + +static void nh_res_bucket_set_idle(const struct nh_res_table *res_table, + struct nh_res_bucket *bucket) +{ + unsigned long now = jiffies; + + atomic_long_set(&bucket->used_time, (long)now); + bucket->migrated_time = now; +} + +static void nh_res_bucket_set_busy(struct nh_res_bucket *bucket) +{ + atomic_long_set(&bucket->used_time, (long)jiffies); +} + static bool valid_group_nh(struct nexthop *nh, unsigned int npaths, bool *is_fdb, struct netlink_ext_ack *extack) { if (nh->is_group) { struct nh_group *nhg = rtnl_dereference(nh->nh_grp); - /* nested multipath (group within a group) is not - * supported - */ + /* Nesting groups within groups is not supported. */ if (nhg->mpath) { NL_SET_ERR_MSG(extack, "Multipath group can not be a nexthop within a group"); return false; } + if (nhg->resilient) { + NL_SET_ERR_MSG(extack, + "Resilient group can not be a nexthop within a group"); + return false; + } *is_fdb = nhg->fdb_nh; } else { struct nh_info *nhi = rtnl_dereference(nh->nh_info); @@ -734,6 +834,22 @@ static struct nexthop *nexthop_select_path_mp(struct nh_group *nhg, int hash) return rc; } +static struct nexthop *nexthop_select_path_res(struct nh_group *nhg, int hash) +{ + struct nh_res_table *res_table = rcu_dereference(nhg->res_table); + u16 bucket_index = hash % res_table->num_nh_buckets; + struct nh_res_bucket *bucket; + struct nh_grp_entry *nhge; + + /* nexthop_select_path() is expected to return a non-NULL value, so + * skip protocol validation and just hand out whatever there is. + */ + bucket = &res_table->nh_buckets[bucket_index]; + nh_res_bucket_set_busy(bucket); + nhge = rcu_dereference(bucket->nh_entry); + return nhge->nh; +} + struct nexthop *nexthop_select_path(struct nexthop *nh, int hash) { struct nh_group *nhg; @@ -744,6 +860,8 @@ struct nexthop *nexthop_select_path(struct nexthop *nh, int hash) nhg = rcu_dereference(nh->nh_grp); if (nhg->mpath) return nexthop_select_path_mp(nhg, hash); + else if (nhg->resilient) + return nexthop_select_path_res(nhg, hash); /* Unreachable. */ return NULL; @@ -926,7 +1044,289 @@ static int fib_check_nh_list(struct nexthop *old, struct nexthop *new, return 0; } -static void nh_group_rebalance(struct nh_group *nhg) +static bool nh_res_nhge_is_balanced(const struct nh_grp_entry *nhge) +{ + return nhge->res.count_buckets == nhge->res.wants_buckets; +} + +static bool nh_res_nhge_is_ow(const struct nh_grp_entry *nhge) +{ + return nhge->res.count_buckets > nhge->res.wants_buckets; +} + +static bool nh_res_nhge_is_uw(const struct nh_grp_entry *nhge) +{ + return nhge->res.count_buckets < nhge->res.wants_buckets; +} + +static bool nh_res_table_is_balanced(const struct nh_res_table *res_table) +{ + return list_empty(&res_table->uw_nh_entries); +} + +static void nh_res_bucket_unset_nh(struct nh_res_bucket *bucket) +{ + struct nh_grp_entry *nhge; + + if (bucket->occupied) { + nhge = nh_res_dereference(bucket->nh_entry); + nhge->res.count_buckets--; + bucket->occupied = false; + } +} + +static void nh_res_bucket_set_nh(struct nh_res_bucket *bucket, + struct nh_grp_entry *nhge) +{ + nh_res_bucket_unset_nh(bucket); + + bucket->occupied = true; + rcu_assign_pointer(bucket->nh_entry, nhge); + nhge->res.count_buckets++; +} + +static bool nh_res_bucket_should_migrate(struct nh_res_table *res_table, + struct nh_res_bucket *bucket, + unsigned long *deadline, bool *force) +{ + unsigned long now = jiffies; + struct nh_grp_entry *nhge; + unsigned long idle_point; + + if (!bucket->occupied) { + /* The bucket is not occupied, its NHGE pointer is either + * NULL or obsolete. We _have to_ migrate: set force. + */ + *force = true; + return true; + } + + nhge = nh_res_dereference(bucket->nh_entry); + + /* If the bucket is populated by an underweight or balanced + * nexthop, do not migrate. + */ + if (!nh_res_nhge_is_ow(nhge)) + return false; + + /* At this point we know that the bucket is populated with an + * overweight nexthop. It needs to be migrated to a new nexthop if + * the idle timer of unbalanced timer expired. + */ + + idle_point = nh_res_bucket_idle_point(res_table, bucket, now); + if (time_after_eq(now, idle_point)) { + /* The bucket is idle. We _can_ migrate: unset force. */ + *force = false; + return true; + } + + /* Unbalanced timer of 0 means "never force". */ + if (res_table->unbalanced_timer) { + unsigned long unb_point; + + unb_point = nh_res_table_unb_point(res_table); + if (time_after(now, unb_point)) { + /* The bucket is not idle, but the unbalanced timer + * expired. We _can_ migrate, but set force anyway, + * so that drivers know to ignore activity reports + * from the HW. + */ + *force = true; + return true; + } + + nh_res_time_set_deadline(unb_point, deadline); + } + + nh_res_time_set_deadline(idle_point, deadline); + return false; +} + +static bool nh_res_bucket_migrate(struct nh_res_table *res_table, + u16 bucket_index, bool force) +{ + struct nh_res_bucket *bucket = &res_table->nh_buckets[bucket_index]; + struct nh_grp_entry *new_nhge; + + new_nhge = list_first_entry_or_null(&res_table->uw_nh_entries, + struct nh_grp_entry, + res.uw_nh_entry); + if (WARN_ON_ONCE(!new_nhge)) + /* If this function is called, "bucket" is either not + * occupied, or it belongs to a next hop that is + * overweight. In either case, there ought to be a + * corresponding underweight next hop. + */ + return false; + + nh_res_bucket_set_nh(bucket, new_nhge); + nh_res_bucket_set_idle(res_table, bucket); + + if (nh_res_nhge_is_balanced(new_nhge)) + list_del(&new_nhge->res.uw_nh_entry); + return true; +} + +#define NH_RES_UPKEEP_DW_MINIMUM_INTERVAL (HZ / 2) + +static void nh_res_table_upkeep(struct nh_res_table *res_table) +{ + unsigned long now = jiffies; + unsigned long deadline; + u16 i; + + /* Deadline is the next time that upkeep should be run. It is the + * earliest time at which one of the buckets might be migrated. + * Start at the most pessimistic estimate: either unbalanced_timer + * from now, or if there is none, idle_timer from now. For each + * encountered time point, call nh_res_time_set_deadline() to + * refine the estimate. + */ + if (res_table->unbalanced_timer) + deadline = now + res_table->unbalanced_timer; + else + deadline = now + res_table->idle_timer; + + for (i = 0; i < res_table->num_nh_buckets; i++) { + struct nh_res_bucket *bucket = &res_table->nh_buckets[i]; + bool force; + + if (nh_res_bucket_should_migrate(res_table, bucket, + &deadline, &force)) { + if (!nh_res_bucket_migrate(res_table, i, force)) { + unsigned long idle_point; + + /* A driver can override the migration + * decision if the HW reports that the + * bucket is actually not idle. Therefore + * remark the bucket as busy again and + * update the deadline. + */ + nh_res_bucket_set_busy(bucket); + idle_point = nh_res_bucket_idle_point(res_table, + bucket, + now); + nh_res_time_set_deadline(idle_point, &deadline); + } + } + } + + /* If the group is still unbalanced, schedule the next upkeep to + * either the deadline computed above, or the minimum deadline, + * whichever comes later. + */ + if (!nh_res_table_is_balanced(res_table)) { + unsigned long now = jiffies; + unsigned long min_deadline; + + min_deadline = now + NH_RES_UPKEEP_DW_MINIMUM_INTERVAL; + if (time_before(deadline, min_deadline)) + deadline = min_deadline; + + queue_delayed_work(system_power_efficient_wq, + &res_table->upkeep_dw, deadline - now); + } +} + +static void nh_res_table_upkeep_dw(struct work_struct *work) +{ + struct delayed_work *dw = to_delayed_work(work); + struct nh_res_table *res_table; + + res_table = container_of(dw, struct nh_res_table, upkeep_dw); + nh_res_table_upkeep(res_table); +} + +static void nh_res_table_cancel_upkeep(struct nh_res_table *res_table) +{ + cancel_delayed_work_sync(&res_table->upkeep_dw); +} + +static void nh_res_group_rebalance(struct nh_group *nhg, + struct nh_res_table *res_table) +{ + int prev_upper_bound = 0; + int total = 0; + int w = 0; + int i; + + INIT_LIST_HEAD(&res_table->uw_nh_entries); + + for (i = 0; i < nhg->num_nh; ++i) + total += nhg->nh_entries[i].weight; + + for (i = 0; i < nhg->num_nh; ++i) { + struct nh_grp_entry *nhge = &nhg->nh_entries[i]; + int upper_bound; + + w += nhge->weight; + upper_bound = DIV_ROUND_CLOSEST(res_table->num_nh_buckets * w, + total); + nhge->res.wants_buckets = upper_bound - prev_upper_bound; + prev_upper_bound = upper_bound; + + if (nh_res_nhge_is_uw(nhge)) { + if (list_empty(&res_table->uw_nh_entries)) + res_table->unbalanced_since = jiffies; + list_add(&nhge->res.uw_nh_entry, + &res_table->uw_nh_entries); + } + } +} + +/* Migrate buckets in res_table so that they reference NHGE's from NHG with + * the right NH ID. Set those buckets that do not have a corresponding NHGE + * entry in NHG as not occupied. + */ +static void nh_res_table_migrate_buckets(struct nh_res_table *res_table, + struct nh_group *nhg) +{ + u16 i; + + for (i = 0; i < res_table->num_nh_buckets; i++) { + struct nh_res_bucket *bucket = &res_table->nh_buckets[i]; + u32 id = rtnl_dereference(bucket->nh_entry)->nh->id; + bool found = false; + int j; + + for (j = 0; j < nhg->num_nh; j++) { + struct nh_grp_entry *nhge = &nhg->nh_entries[j]; + + if (nhge->nh->id == id) { + nh_res_bucket_set_nh(bucket, nhge); + found = true; + break; + } + } + + if (!found) + nh_res_bucket_unset_nh(bucket); + } +} + +static void replace_nexthop_grp_res(struct nh_group *oldg, + struct nh_group *newg) +{ + /* For NH group replacement, the new NHG might only have a stub + * hash table with 0 buckets, because the number of buckets was not + * specified. For NH removal, oldg and newg both reference the same + * res_table. So in any case, in the following, we want to work + * with oldg->res_table. + */ + struct nh_res_table *old_res_table = rtnl_dereference(oldg->res_table); + unsigned long prev_unbalanced_since = old_res_table->unbalanced_since; + bool prev_has_uw = !list_empty(&old_res_table->uw_nh_entries); + + nh_res_table_cancel_upkeep(old_res_table); + nh_res_table_migrate_buckets(old_res_table, newg); + nh_res_group_rebalance(newg, old_res_table); + if (prev_has_uw && !list_empty(&old_res_table->uw_nh_entries)) + old_res_table->unbalanced_since = prev_unbalanced_since; + nh_res_table_upkeep(old_res_table); +} + +static void nh_mp_group_rebalance(struct nh_group *nhg) { int total = 0; int w = 0; @@ -969,6 +1369,7 @@ static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge, newg->has_v4 = false; newg->is_multipath = nhg->is_multipath; newg->mpath = nhg->mpath; + newg->resilient = nhg->resilient; newg->fdb_nh = nhg->fdb_nh; newg->num_nh = nhg->num_nh; @@ -996,7 +1397,11 @@ static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge, j++; } - nh_group_rebalance(newg); + if (newg->mpath) + nh_mp_group_rebalance(newg); + else if (newg->resilient) + replace_nexthop_grp_res(nhg, newg); + rcu_assign_pointer(nhp->nh_grp, newg); list_del(&nhge->nh_list); @@ -1025,6 +1430,7 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh, static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo) { struct nh_group *nhg = rcu_dereference_rtnl(nh->nh_grp); + struct nh_res_table *res_table; int i, num_nh = nhg->num_nh; for (i = 0; i < num_nh; ++i) { @@ -1035,6 +1441,11 @@ static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo) list_del_init(&nhge->nh_list); } + + if (nhg->resilient) { + res_table = rtnl_dereference(nhg->res_table); + nh_res_table_cancel_upkeep(res_table); + } } /* not called for nexthop replace */ @@ -1113,6 +1524,9 @@ static int replace_nexthop_grp(struct net *net, struct nexthop *old, struct nexthop *new, const struct nh_config *cfg, struct netlink_ext_ack *extack) { + struct nh_res_table *tmp_table = NULL; + struct nh_res_table *new_res_table; + struct nh_res_table *old_res_table; struct nh_group *oldg, *newg; int i, err; @@ -1121,19 +1535,57 @@ static int replace_nexthop_grp(struct net *net, struct nexthop *old, return -EINVAL; } - err = call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, new, extack); - if (err) - return err; - oldg = rtnl_dereference(old->nh_grp); newg = rtnl_dereference(new->nh_grp); + if (newg->mpath != oldg->mpath) { + NL_SET_ERR_MSG(extack, "Can not replace a nexthop group with one of a different type."); + return -EINVAL; + } + + if (newg->mpath) { + err = call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, new, + extack); + if (err) + return err; + } else if (newg->resilient) { + new_res_table = rtnl_dereference(newg->res_table); + old_res_table = rtnl_dereference(oldg->res_table); + + /* Accept if num_nh_buckets was not given, but if it was + * given, demand that the value be correct. + */ + if (cfg->nh_grp_res_has_num_buckets && + cfg->nh_grp_res_num_buckets != + old_res_table->num_nh_buckets) { + NL_SET_ERR_MSG(extack, "Can not change number of buckets of a resilient nexthop group."); + return -EINVAL; + } + + if (cfg->nh_grp_res_has_idle_timer) + old_res_table->idle_timer = cfg->nh_grp_res_idle_timer; + if (cfg->nh_grp_res_has_unbalanced_timer) + old_res_table->unbalanced_timer = + cfg->nh_grp_res_unbalanced_timer; + + replace_nexthop_grp_res(oldg, newg); + + tmp_table = new_res_table; + rcu_assign_pointer(newg->res_table, old_res_table); + rcu_assign_pointer(newg->spare->res_table, old_res_table); + } + /* update parents - used by nexthop code for cleanup */ for (i = 0; i < newg->num_nh; i++) newg->nh_entries[i].nh_parent = old; rcu_assign_pointer(old->nh_grp, newg); + if (newg->resilient) { + rcu_assign_pointer(oldg->res_table, tmp_table); + rcu_assign_pointer(oldg->spare->res_table, tmp_table); + } + for (i = 0; i < oldg->num_nh; i++) oldg->nh_entries[i].nh_parent = new; @@ -1383,6 +1835,27 @@ static int insert_nexthop(struct net *net, struct nexthop *new_nh, goto out; } + if (new_nh->is_group) { + struct nh_group *nhg = rtnl_dereference(new_nh->nh_grp); + struct nh_res_table *res_table; + + if (nhg->resilient) { + res_table = rtnl_dereference(nhg->res_table); + + /* Not passing the number of buckets is OK when + * replacing, but not when creating a new group. + */ + if (!cfg->nh_grp_res_has_num_buckets) { + NL_SET_ERR_MSG(extack, "Number of buckets not specified for nexthop group insertion"); + rc = -EINVAL; + goto out; + } + + nh_res_group_rebalance(nhg, res_table); + nh_res_table_upkeep(res_table); + } + } + rb_link_node_rcu(&new_nh->rb_node, parent, pp); rb_insert_color(&new_nh->rb_node, root); @@ -1445,6 +1918,7 @@ static struct nexthop *nexthop_create_group(struct net *net, u16 num_nh = nla_len(grps_attr) / sizeof(*entry); struct nh_group *nhg; struct nexthop *nh; + int err; int i; if (WARN_ON(!num_nh)) @@ -1476,8 +1950,10 @@ static struct nexthop *nexthop_create_group(struct net *net, struct nh_info *nhi; nhe = nexthop_find_by_id(net, entry[i].id); - if (!nexthop_get(nhe)) + if (!nexthop_get(nhe)) { + err = -ENOENT; goto out_no_nh; + } nhi = rtnl_dereference(nhe->nh_info); if (nhi->family == AF_INET) @@ -1493,13 +1969,28 @@ static struct nexthop *nexthop_create_group(struct net *net, nhg->mpath = 1; nhg->is_multipath = true; } else if (cfg->nh_grp_type == NEXTHOP_GRP_TYPE_RES) { + struct nh_res_table *res_table; + + /* Bounce resilient groups for now. */ + err = -EINVAL; goto out_no_nh; + + res_table = nexthop_res_table_alloc(net, cfg->nh_id, cfg); + if (!res_table) { + err = -ENOMEM; + goto out_no_nh; + } + + rcu_assign_pointer(nhg->spare->res_table, res_table); + rcu_assign_pointer(nhg->res_table, res_table); + nhg->resilient = true; + nhg->is_multipath = true; } - WARN_ON_ONCE(nhg->mpath != 1); + WARN_ON_ONCE(nhg->mpath + nhg->resilient != 1); if (nhg->mpath) - nh_group_rebalance(nhg); + nh_mp_group_rebalance(nhg); if (cfg->nh_fdb) nhg->fdb_nh = 1; @@ -1518,7 +2009,7 @@ static struct nexthop *nexthop_create_group(struct net *net, kfree(nhg); kfree(nh); - return ERR_PTR(-ENOENT); + return ERR_PTR(err); } static int nh_create_ipv4(struct net *net, struct nexthop *nh, From patchwork Thu Mar 11 18:03:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 399478 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBCAFC43381 for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 81F5E64FE9 for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230283AbhCKSEm (ORCPT ); Thu, 11 Mar 2021 13:04:42 -0500 Received: from mail-co1nam11on2045.outbound.protection.outlook.com ([40.107.220.45]:34785 "EHLO NAM11-CO1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229944AbhCKSE0 (ORCPT ); Thu, 11 Mar 2021 13:04:26 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OrnTrPL1DEdjzVZMToPgBSviCoY0Xex8230H4Z4HkWmNGhEZJkka7hNNVL2OTTYlQxC+uxUfbDsuE6q4BRtw7YJ9SxPTdcraC7JuzxVUKEV5KL2s/nLGHYcF6J2AO2zc6xRG4uAxUv3xbm/0eVh6+1qq7hZ2k1279kB/WufDBc14zqfagHomNDBu92Hlk3LJATVWZQicrzmLOFwnF+hNl8O2YPQIXNw501q6j2gAwsbiRQ6nhHUrOFBywtNoxNZf08uqF/+Kb2LGYRGFfEL1ZUZrmF8UOzJZnjJ4WQ4D2oXZJATI9fBjORvTBhTMaaV1a2kbf90k5XqIfQwx/X1XWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=fcr5Rmawi5SAH5AH1CEvdJL1hBgcM8TnNmO4h5CMyZs=; b=mfjt5NKRvuW1FY9WZ3T/k1s/r3BFPLHXssS0KSZar38nKaUAm65qYy5UVTYiIOL8mjTaoSSFZYTiarFUL/3QiCqNpHKIqYUweKymWuPuifQLnlF1L8KLz4Um2CKdYr5ZqBMPi0+9hJoDMQXnv1btv/vUW7gDs8HDE68jRUyTMlT+xdh5O4AoWwGx3AynF9G94om8BuRGol+bWMF4NAGkkKtvf1YbmLi9TFLwv7epASlRg9+TvQmPjWvw23e2A7UZK1d4Ej1GiaEvhP/4Gc4HCOQYPewFwrEGqYazERu6gXikmMQU/RBHjYhv6hYhSdibk9r2Fy7NQ+bvgr0XhgNcRA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=fcr5Rmawi5SAH5AH1CEvdJL1hBgcM8TnNmO4h5CMyZs=; b=alLKwQtsFIxifo/qZ8oCe/yXWAwJyke/NdHV2LICFiHE9uLCFyc3xDPbPcDFiP2Hxq1iJVUVRJkzQqFUzDv99TlvUv09ipZzkrJ7qTXTlQxPViA3yEqlOYMSBt/2HaTPCroxiEeEvCj1ptyKVwPxANIv+dTzJk2EyWdR2EzXw7QgEpmh8PlQ/bs1tzIWPKlg9Tx/5bs2QGg0RRatz53ZaTEt78viWFFaPGoCFC4ozrKI3wsoxgHmtjURFI5zwUInbZJZFYnRUQ33WuLBnBTgr9l75oMTYOyd6RexjEAXwobBoIVjrSfkFaVD9Q8e/eufksvuS/X166ffYQ0I3qOZMQ== Received: from BN6PR13CA0028.namprd13.prod.outlook.com (2603:10b6:404:13e::14) by SN6PR12MB4720.namprd12.prod.outlook.com (2603:10b6:805:e6::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17; Thu, 11 Mar 2021 18:04:13 +0000 Received: from BN8NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:404:13e:cafe::4) by BN6PR13CA0028.outlook.office365.com (2603:10b6:404:13e::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3955.10 via Frontend Transport; Thu, 11 Mar 2021 18:04:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT060.mail.protection.outlook.com (10.13.177.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:13 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:09 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 06/14] nexthop: Add data structures for resilient group notifications Date: Thu, 11 Mar 2021 19:03:17 +0100 Message-ID: <30f98ce4f780d167a45454f09fb4b0111b449f7e.1615485052.git.petrm@nvidia.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3352acdc-e62e-4c7d-d930-08d8e4b81430 X-MS-TrafficTypeDiagnostic: SN6PR12MB4720: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:6108; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9JUrByedGIgdxyYGstVGGkW5dyTKziI8LDfGb4C8fC8L1gTxEgHcwy3Rr9XPrdwsvhAAVct+OpZQhviV8BrYBWUWkMKExU/N7MWsSt2bQDC/wqQszLkkjuUPCtzyIWwD6oYezFuLp02fxZsdhs1qJ6LJLX5NEE4xA0jJdyQ0U8tYTMcDAV5ZVRLtSaSSaPiRhljFBdmeHsr+MVtIU7VluN/R09AzcyAefj+WmiDdB3wjYkY+18k1gNKHquZ/VAsGVYQHPc/hAJUir1SW3Tf4j9O2k3L1U7NC0XXH8f7emkymdrJwNV6iTHncgxpJ0bFMVdIu+JBX1GOz5TgXF7OH5Ehwp8leW6C1EUAD7xc6ewV95r1foLK2/tO7PalpK5ulCfRxlZbqSvc3VrbqMgg/7drs/Iuv5WJyMg9k4+QOzqISakuU7SL7nv4Wn6b2J+W0JWdYT1cLrlXwL7X5kepX9bCb4Y3sb9Ii11aY7PNNUvftYR7XGEDc/gEgY8LatRx0hWXB+BLnTl+iGKir6BVO0j0ZGE9x4R22grYj44NqbBi4y9DwZRwRCPMWhmtmxQmSik3NhVYJ5FMrtLF0MhPaZBSTCpYl1GXy95JA90upldO68iHnMFPkiJgNiiw4FAKSjZ273coLgzjkguIQbnvNOBuNTkLtHqq8bNggGnejI1nBAEh1JEOoGDtu9AN9kcOg X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(136003)(396003)(376002)(346002)(39860400002)(36840700001)(46966006)(26005)(356005)(83380400001)(8676002)(86362001)(47076005)(36860700001)(186003)(336012)(4326008)(107886003)(8936002)(82740400003)(426003)(16526019)(36756003)(2616005)(6916009)(70586007)(70206006)(7636003)(2906002)(5660300002)(36906005)(15650500001)(34020700004)(6666004)(82310400003)(478600001)(316002)(54906003); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:13.5137 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3352acdc-e62e-4c7d-d930-08d8e4b81430 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR12MB4720 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ido Schimmel Add data structures that will be used for in-kernel notifications about addition / deletion of a resilient nexthop group and about changes to a hash bucket within a resilient group. Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata Reviewed-by: David Ahern Signed-off-by: Petr Machata --- Notes: v1 (changes since RFC): - u32 -> u16 for bucket counts / indices include/net/nexthop.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/include/net/nexthop.h b/include/net/nexthop.h index b78505c9031e..fd3c0debe8bf 100644 --- a/include/net/nexthop.h +++ b/include/net/nexthop.h @@ -155,11 +155,15 @@ struct nexthop { enum nexthop_event_type { NEXTHOP_EVENT_DEL, NEXTHOP_EVENT_REPLACE, + NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE, + NEXTHOP_EVENT_BUCKET_REPLACE, }; enum nh_notifier_info_type { NH_NOTIFIER_INFO_TYPE_SINGLE, NH_NOTIFIER_INFO_TYPE_GRP, + NH_NOTIFIER_INFO_TYPE_RES_TABLE, + NH_NOTIFIER_INFO_TYPE_RES_BUCKET, }; struct nh_notifier_single_info { @@ -186,6 +190,19 @@ struct nh_notifier_grp_info { struct nh_notifier_grp_entry_info nh_entries[]; }; +struct nh_notifier_res_bucket_info { + u16 bucket_index; + unsigned int idle_timer_ms; + bool force; + struct nh_notifier_single_info old_nh; + struct nh_notifier_single_info new_nh; +}; + +struct nh_notifier_res_table_info { + u16 num_nh_buckets; + struct nh_notifier_single_info nhs[]; +}; + struct nh_notifier_info { struct net *net; struct netlink_ext_ack *extack; @@ -194,6 +211,8 @@ struct nh_notifier_info { union { struct nh_notifier_single_info *nh; struct nh_notifier_grp_info *nh_grp; + struct nh_notifier_res_table_info *nh_res_table; + struct nh_notifier_res_bucket_info *nh_res_bucket; }; }; From patchwork Thu Mar 11 18:03:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 398179 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF98DC4332B for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A55D264FF9 for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230288AbhCKSEm (ORCPT ); Thu, 11 Mar 2021 13:04:42 -0500 Received: from mail-bn7nam10on2045.outbound.protection.outlook.com ([40.107.92.45]:41849 "EHLO NAM10-BN7-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230016AbhCKSE2 (ORCPT ); Thu, 11 Mar 2021 13:04:28 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=epTD1Xc5/t2p8tmvcid0PgF3UyBTzwInqYpmweXHDGCNhRvtoe8pb50VHFjzYMU5voiFMsH1iiQZWxZHBQvqfZpizYPjFZQ0x0eH4hl2IT+FFJ9+djWayCIHiK1+JvdFbGOslRqDZPxEXvnJmT7dzQueDOmAvnV6kyi3u4mkYQV4GriI1HbwqK8vq9fF/3bwM2PXMO6hA+05r0pXfvRMIlkLXczj3cqoOZ55JPAs8YIdkGehB941leEEILjzMt1q+TLZNx+at6yzySKfqdo/7efMogiU3DuSpF3rAHYlIEzgKVhJM8df/coC7zqqbYcSDvSRqAwx9gNsEoVzHKWPdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Zv5W2Rbggr/UFHegX2bqSC3QOrabG5iYN8mssECGFho=; b=cgNryH8ja3Yr7Kuj4K+p+BQ1tAohsnzxdQR7PF1fC/Vm/GLDORlUoXKN3do3Iqpyx9s7iWTteHpGUMtlRaUPUnfz3h2XviCOTHvWxsXD/XOMfNC7tX/r6k4emF3dOSQ6oAwH5hEgPWw2wrO0llj31BkKiNhWP7dxDqw6nt5Hs/qtJfH9jtzd0MtVn5O734cU6X/Gh140IOSV29hROj1dxJo4eJMDQT6uswWAhMel2Vq8jUVBxSOU6Br/BSu+qMTRvXdKkxAEUHoUf9XB3dhGookLaHe7vI84HuoBbSLq9uXjphdLJ70UmhoO5Tz6rREb5zEdOyxpWGrrMFcqEXTcuQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Zv5W2Rbggr/UFHegX2bqSC3QOrabG5iYN8mssECGFho=; b=hDzB6CZzQDJd6H/Rune8qS8Kp9EM8FPFqOrA+z7KjXIVnSPOjQz34dCo+vq40oSiFPQGXSQQ12QKLXx/keJp74DSKdICweDMGodqGsF0KfcF4a7mmvm0CHqTFW52yQq8te2ZujcmyXRIE+QsF1fIWAHDmvh+wb2/w7xcKa8YbnlLRvC5ePlF194D0mrJEpB3nX/E+1mltQ2ISvWOFGMx9/TpYvhb2NK5BGUB0otCWEgvE4dD6TMumPQ0HsRVKh2m8YAkQldHPLbFgZeyv6esKKwgwGlIOKgjYoaLD7RuWiZExlfRBTIK7d4sgCcQHLnk6m40My0UxB4Vi6osCHO6Xg== Received: from BN6PR13CA0028.namprd13.prod.outlook.com (2603:10b6:404:13e::14) by BN9PR12MB5260.namprd12.prod.outlook.com (2603:10b6:408:101::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.8; Thu, 11 Mar 2021 18:04:24 +0000 Received: from BN8NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:404:13e:cafe::a6) by BN6PR13CA0028.outlook.office365.com (2603:10b6:404:13e::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3955.10 via Frontend Transport; Thu, 11 Mar 2021 18:04:24 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT060.mail.protection.outlook.com (10.13.177.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:24 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:12 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 07/14] nexthop: Implement notifiers for resilient nexthop groups Date: Thu, 11 Mar 2021 19:03:18 +0100 Message-ID: <7908616da5260522f7baf34f373a669a3bcd0025.1615485052.git.petrm@nvidia.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: d595fb27-3248-432d-81a9-08d8e4b81a7c X-MS-TrafficTypeDiagnostic: BN9PR12MB5260: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qyf6ZtiVh2x2uudhkVSg0ynDdICAd/uaD1cu4LjxvXoThF511FKrPhnqfdYHE0hhbdURnJx3Aov+Lpf83VQPjQgXvBaFFnPiCz0pA37iIo+HnhQ0tlfOp2MDslBn1E6OPi0/b3/JhGT51yJK/FohtLghYKjL5PFlZMYGvzRrSRwUXTgvQZvY6OC2fdAY7hasci4GYGhkhmAPJvKDUhzhfzgKwr9uHlrbIbFPqdtzpEsnBt2lfO8GNwkRnC7CdVIsit1eWLG8xeOrUW9fvQtQfLLLpIw3NB0cp6wdbDjQ+dL/GnBckYHTXYXlMj3JBNOe7xFQ2F8puIddR2E5+MG8xDl4H3qTCabHqoTpJyMobTYtu6ALU4EkjG9qRpDBz/7NfL70ppAmPcV3fIZ3IECyKSf5Kh4+XENGRl7r1VJK3dW0GYOID2aLs+yev5RXM39AbFZigx9/sW58Rj6pER08qQqC6dfIqFVmf5F2nuZEtJdLRuLvlBdgZS6gybh5eH5ocKEDzY2xzZIJ1ked/ES1lrdA6H6cGQgM5DEuY9AivxWrhw+xyQgAs748VCVsxImxgtKgz23bDYEN9pT025fH6a20XIvIXOSNuo84EfgbhbQtzJVMu/ajK+GOX6bVhwHscZk6I3yKYXk2GwRKkt25sN/pfUZn4nt0jCafldywfT82aszI4CMcFt1l+yTPB2/r X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(39860400002)(376002)(396003)(346002)(136003)(46966006)(36840700001)(186003)(107886003)(4326008)(16526019)(426003)(2616005)(336012)(8676002)(82310400003)(36860700001)(70586007)(70206006)(26005)(30864003)(8936002)(34020700004)(5660300002)(36756003)(6916009)(86362001)(82740400003)(83380400001)(7636003)(356005)(54906003)(2906002)(36906005)(316002)(478600001)(47076005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:24.0019 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d595fb27-3248-432d-81a9-08d8e4b81a7c X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN9PR12MB5260 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Implement the following notifications towards drivers: - NEXTHOP_EVENT_REPLACE, when a resilient nexthop group is created. - NEXTHOP_EVENT_BUCKET_REPLACE any time there is a change in assignment of next hops to hash table buckets. That includes replacements, deletions, and delayed upkeep cycles. Some bucket notifications can be vetoed by the driver, to make it possible to propagate bucket busy-ness flags from the HW back to the algorithm. Some are however forced, e.g. if a next hop is deleted, all buckets that use this next hop simply must be migrated, whether the HW wishes so or not. - NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE, before a resilient nexthop group is replaced. Usually the driver will get the bucket notifications as well, and could veto those. But in some cases, a bucket may not be migrated immediately, but during delayed upkeep, and that is too late to roll the transaction back. This notification allows the driver to take a look and veto the new proposed group up front, before anything is committed. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: David Ahern --- Notes: v1 (changes since RFC): - u32 -> u16 for bucket counts / indices net/ipv4/nexthop.c | 320 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 308 insertions(+), 12 deletions(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 0e2ff72e10c0..8b06aafc2e9e 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -115,6 +115,37 @@ static int nh_notifier_mp_info_init(struct nh_notifier_info *info, return 0; } +static int nh_notifier_res_table_info_init(struct nh_notifier_info *info, + struct nh_group *nhg) +{ + struct nh_res_table *res_table = rtnl_dereference(nhg->res_table); + u16 num_nh_buckets = res_table->num_nh_buckets; + unsigned long size; + u16 i; + + info->type = NH_NOTIFIER_INFO_TYPE_RES_TABLE; + size = struct_size(info->nh_res_table, nhs, num_nh_buckets); + info->nh_res_table = __vmalloc(size, GFP_KERNEL | __GFP_ZERO | + __GFP_NOWARN); + if (!info->nh_res_table) + return -ENOMEM; + + info->nh_res_table->num_nh_buckets = num_nh_buckets; + + for (i = 0; i < num_nh_buckets; i++) { + struct nh_res_bucket *bucket = &res_table->nh_buckets[i]; + struct nh_grp_entry *nhge; + struct nh_info *nhi; + + nhge = rtnl_dereference(bucket->nh_entry); + nhi = rtnl_dereference(nhge->nh->nh_info); + __nh_notifier_single_info_init(&info->nh_res_table->nhs[i], + nhi); + } + + return 0; +} + static int nh_notifier_grp_info_init(struct nh_notifier_info *info, const struct nexthop *nh) { @@ -122,6 +153,8 @@ static int nh_notifier_grp_info_init(struct nh_notifier_info *info, if (nhg->mpath) return nh_notifier_mp_info_init(info, nhg); + else if (nhg->resilient) + return nh_notifier_res_table_info_init(info, nhg); return -EINVAL; } @@ -132,6 +165,8 @@ static void nh_notifier_grp_info_fini(struct nh_notifier_info *info, if (nhg->mpath) kfree(info->nh_grp); + else if (nhg->resilient) + vfree(info->nh_res_table); } static int nh_notifier_info_init(struct nh_notifier_info *info, @@ -183,6 +218,107 @@ static int call_nexthop_notifiers(struct net *net, return notifier_to_errno(err); } +static int +nh_notifier_res_bucket_idle_timer_get(const struct nh_notifier_info *info, + bool force, unsigned int *p_idle_timer_ms) +{ + struct nh_res_table *res_table; + struct nh_group *nhg; + struct nexthop *nh; + int err = 0; + + /* When 'force' is false, nexthop bucket replacement is performed + * because the bucket was deemed to be idle. In this case, capable + * listeners can choose to perform an atomic replacement: The bucket is + * only replaced if it is inactive. However, if the idle timer interval + * is smaller than the interval in which a listener is querying + * buckets' activity from the device, then atomic replacement should + * not be tried. Pass the idle timer value to listeners, so that they + * could determine which type of replacement to perform. + */ + if (force) { + *p_idle_timer_ms = 0; + return 0; + } + + rcu_read_lock(); + + nh = nexthop_find_by_id(info->net, info->id); + if (!nh) { + err = -EINVAL; + goto out; + } + + nhg = rcu_dereference(nh->nh_grp); + res_table = rcu_dereference(nhg->res_table); + *p_idle_timer_ms = jiffies_to_msecs(res_table->idle_timer); + +out: + rcu_read_unlock(); + + return err; +} + +static int nh_notifier_res_bucket_info_init(struct nh_notifier_info *info, + u16 bucket_index, bool force, + struct nh_info *oldi, + struct nh_info *newi) +{ + unsigned int idle_timer_ms; + int err; + + err = nh_notifier_res_bucket_idle_timer_get(info, force, + &idle_timer_ms); + if (err) + return err; + + info->type = NH_NOTIFIER_INFO_TYPE_RES_BUCKET; + info->nh_res_bucket = kzalloc(sizeof(*info->nh_res_bucket), + GFP_KERNEL); + if (!info->nh_res_bucket) + return -ENOMEM; + + info->nh_res_bucket->bucket_index = bucket_index; + info->nh_res_bucket->idle_timer_ms = idle_timer_ms; + info->nh_res_bucket->force = force; + __nh_notifier_single_info_init(&info->nh_res_bucket->old_nh, oldi); + __nh_notifier_single_info_init(&info->nh_res_bucket->new_nh, newi); + return 0; +} + +static void nh_notifier_res_bucket_info_fini(struct nh_notifier_info *info) +{ + kfree(info->nh_res_bucket); +} + +static int __call_nexthop_res_bucket_notifiers(struct net *net, u32 nhg_id, + u16 bucket_index, bool force, + struct nh_info *oldi, + struct nh_info *newi, + struct netlink_ext_ack *extack) +{ + struct nh_notifier_info info = { + .net = net, + .extack = extack, + .id = nhg_id, + }; + int err; + + if (nexthop_notifiers_is_empty(net)) + return 0; + + err = nh_notifier_res_bucket_info_init(&info, bucket_index, force, + oldi, newi); + if (err) + return err; + + err = blocking_notifier_call_chain(&net->nexthop.notifier_chain, + NEXTHOP_EVENT_BUCKET_REPLACE, &info); + nh_notifier_res_bucket_info_fini(&info); + + return notifier_to_errno(err); +} + /* There are three users of RES_TABLE, and NHs etc. referenced from there: * * 1) a collection of callbacks for NH maintenance. This operates under @@ -207,6 +343,53 @@ static int call_nexthop_notifiers(struct net *net, */ #define nh_res_dereference(p) (rcu_dereference_raw(p)) +static int call_nexthop_res_bucket_notifiers(struct net *net, u32 nhg_id, + u16 bucket_index, bool force, + struct nexthop *old_nh, + struct nexthop *new_nh, + struct netlink_ext_ack *extack) +{ + struct nh_info *oldi = nh_res_dereference(old_nh->nh_info); + struct nh_info *newi = nh_res_dereference(new_nh->nh_info); + + return __call_nexthop_res_bucket_notifiers(net, nhg_id, bucket_index, + force, oldi, newi, extack); +} + +static int call_nexthop_res_table_notifiers(struct net *net, struct nexthop *nh, + struct netlink_ext_ack *extack) +{ + struct nh_notifier_info info = { + .net = net, + .extack = extack, + }; + struct nh_group *nhg; + int err; + + ASSERT_RTNL(); + + if (nexthop_notifiers_is_empty(net)) + return 0; + + /* At this point, the nexthop buckets are still not populated. Only + * emit a notification with the logical nexthops, so that a listener + * could potentially veto it in case of unsupported configuration. + */ + nhg = rtnl_dereference(nh->nh_grp); + err = nh_notifier_mp_info_init(&info, nhg); + if (err) { + NL_SET_ERR_MSG(extack, "Failed to initialize nexthop notifier info"); + return err; + } + + err = blocking_notifier_call_chain(&net->nexthop.notifier_chain, + NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE, + &info); + kfree(info.nh_grp); + + return notifier_to_errno(err); +} + static int call_nexthop_notifier(struct notifier_block *nb, struct net *net, enum nexthop_event_type event_type, struct nexthop *nh, @@ -1144,10 +1327,12 @@ static bool nh_res_bucket_should_migrate(struct nh_res_table *res_table, } static bool nh_res_bucket_migrate(struct nh_res_table *res_table, - u16 bucket_index, bool force) + u16 bucket_index, bool notify, bool force) { struct nh_res_bucket *bucket = &res_table->nh_buckets[bucket_index]; struct nh_grp_entry *new_nhge; + struct netlink_ext_ack extack; + int err; new_nhge = list_first_entry_or_null(&res_table->uw_nh_entries, struct nh_grp_entry, @@ -1160,6 +1345,28 @@ static bool nh_res_bucket_migrate(struct nh_res_table *res_table, */ return false; + if (notify) { + struct nh_grp_entry *old_nhge; + + old_nhge = nh_res_dereference(bucket->nh_entry); + err = call_nexthop_res_bucket_notifiers(res_table->net, + res_table->nhg_id, + bucket_index, force, + old_nhge->nh, + new_nhge->nh, &extack); + if (err) { + pr_err_ratelimited("%s\n", extack._msg); + if (!force) + return false; + /* It is not possible to veto a forced replacement, so + * just clear the hardware flags from the nexthop + * bucket to indicate to user space that this bucket is + * not correctly populated in hardware. + */ + bucket->nh_flags &= ~(RTNH_F_OFFLOAD | RTNH_F_TRAP); + } + } + nh_res_bucket_set_nh(bucket, new_nhge); nh_res_bucket_set_idle(res_table, bucket); @@ -1170,7 +1377,7 @@ static bool nh_res_bucket_migrate(struct nh_res_table *res_table, #define NH_RES_UPKEEP_DW_MINIMUM_INTERVAL (HZ / 2) -static void nh_res_table_upkeep(struct nh_res_table *res_table) +static void nh_res_table_upkeep(struct nh_res_table *res_table, bool notify) { unsigned long now = jiffies; unsigned long deadline; @@ -1194,7 +1401,8 @@ static void nh_res_table_upkeep(struct nh_res_table *res_table) if (nh_res_bucket_should_migrate(res_table, bucket, &deadline, &force)) { - if (!nh_res_bucket_migrate(res_table, i, force)) { + if (!nh_res_bucket_migrate(res_table, i, notify, + force)) { unsigned long idle_point; /* A driver can override the migration @@ -1235,7 +1443,7 @@ static void nh_res_table_upkeep_dw(struct work_struct *work) struct nh_res_table *res_table; res_table = container_of(dw, struct nh_res_table, upkeep_dw); - nh_res_table_upkeep(res_table); + nh_res_table_upkeep(res_table, true); } static void nh_res_table_cancel_upkeep(struct nh_res_table *res_table) @@ -1323,7 +1531,7 @@ static void replace_nexthop_grp_res(struct nh_group *oldg, nh_res_group_rebalance(newg, old_res_table); if (prev_has_uw && !list_empty(&old_res_table->uw_nh_entries)) old_res_table->unbalanced_since = prev_unbalanced_since; - nh_res_table_upkeep(old_res_table); + nh_res_table_upkeep(old_res_table, true); } static void nh_mp_group_rebalance(struct nh_group *nhg) @@ -1407,9 +1615,15 @@ static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge, list_del(&nhge->nh_list); nexthop_put(nhge->nh); - err = call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, nhp, &extack); - if (err) - pr_err("%s\n", extack._msg); + /* Removal of a NH from a resilient group is notified through + * bucket notifications. + */ + if (newg->mpath) { + err = call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, nhp, + &extack); + if (err) + pr_err("%s\n", extack._msg); + } if (nlinfo) nexthop_notify(RTM_NEWNEXTHOP, nhp, nlinfo); @@ -1562,6 +1776,16 @@ static int replace_nexthop_grp(struct net *net, struct nexthop *old, return -EINVAL; } + /* Emit a pre-replace notification so that listeners could veto + * a potentially unsupported configuration. Otherwise, + * individual bucket replacement notifications would need to be + * vetoed, which is something that should only happen if the + * bucket is currently active. + */ + err = call_nexthop_res_table_notifiers(net, new, extack); + if (err) + return err; + if (cfg->nh_grp_res_has_idle_timer) old_res_table->idle_timer = cfg->nh_grp_res_idle_timer; if (cfg->nh_grp_res_has_unbalanced_timer) @@ -1611,6 +1835,71 @@ static void nh_group_v4_update(struct nh_group *nhg) nhg->has_v4 = has_v4; } +static int replace_nexthop_single_notify_res(struct net *net, + struct nh_res_table *res_table, + struct nexthop *old, + struct nh_info *oldi, + struct nh_info *newi, + struct netlink_ext_ack *extack) +{ + u32 nhg_id = res_table->nhg_id; + int err; + u16 i; + + for (i = 0; i < res_table->num_nh_buckets; i++) { + struct nh_res_bucket *bucket = &res_table->nh_buckets[i]; + struct nh_grp_entry *nhge; + + nhge = rtnl_dereference(bucket->nh_entry); + if (nhge->nh == old) { + err = __call_nexthop_res_bucket_notifiers(net, nhg_id, + i, true, + oldi, newi, + extack); + if (err) + goto err_notify; + } + } + + return 0; + +err_notify: + while (i-- > 0) { + struct nh_res_bucket *bucket = &res_table->nh_buckets[i]; + struct nh_grp_entry *nhge; + + nhge = rtnl_dereference(bucket->nh_entry); + if (nhge->nh == old) + __call_nexthop_res_bucket_notifiers(net, nhg_id, i, + true, newi, oldi, + extack); + } + return err; +} + +static int replace_nexthop_single_notify(struct net *net, + struct nexthop *group_nh, + struct nexthop *old, + struct nh_info *oldi, + struct nh_info *newi, + struct netlink_ext_ack *extack) +{ + struct nh_group *nhg = rtnl_dereference(group_nh->nh_grp); + struct nh_res_table *res_table; + + if (nhg->mpath) { + return call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, + group_nh, extack); + } else if (nhg->resilient) { + res_table = rtnl_dereference(nhg->res_table); + return replace_nexthop_single_notify_res(net, res_table, + old, oldi, newi, + extack); + } + + return -EINVAL; +} + static int replace_nexthop_single(struct net *net, struct nexthop *old, struct nexthop *new, struct netlink_ext_ack *extack) @@ -1653,8 +1942,8 @@ static int replace_nexthop_single(struct net *net, struct nexthop *old, list_for_each_entry(nhge, &old->grp_list, nh_list) { struct nexthop *nhp = nhge->nh_parent; - err = call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, nhp, - extack); + err = replace_nexthop_single_notify(net, nhp, old, oldi, newi, + extack); if (err) goto err_notify; } @@ -1684,7 +1973,7 @@ static int replace_nexthop_single(struct net *net, struct nexthop *old, list_for_each_entry_continue_reverse(nhge, &old->grp_list, nh_list) { struct nexthop *nhp = nhge->nh_parent; - call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, nhp, extack); + replace_nexthop_single_notify(net, nhp, old, newi, oldi, NULL); } call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, old, extack); return err; @@ -1852,13 +2141,20 @@ static int insert_nexthop(struct net *net, struct nexthop *new_nh, } nh_res_group_rebalance(nhg, res_table); - nh_res_table_upkeep(res_table); + + /* Do not send bucket notifications, we do full + * notification below. + */ + nh_res_table_upkeep(res_table, false); } } rb_link_node_rcu(&new_nh->rb_node, parent, pp); rb_insert_color(&new_nh->rb_node, root); + /* The initial insertion is a full notification for mpath as well + * as resilient groups. + */ rc = call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, new_nh, extack); if (rc) rb_erase(&new_nh->rb_node, &net->nexthop.rb_root); From patchwork Thu Mar 11 18:03:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 399479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2986C4332E for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C43E164FEE for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230300AbhCKSEo (ORCPT ); Thu, 11 Mar 2021 13:04:44 -0500 Received: from mail-mw2nam10on2041.outbound.protection.outlook.com ([40.107.94.41]:25441 "EHLO NAM10-MW2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229972AbhCKSE2 (ORCPT ); Thu, 11 Mar 2021 13:04:28 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BpgT6ZSfHlBjE7H3yDpnfWyjpxMo5hblGTlONvZAACLe46TjKrcTpDmZwy3pfLfZbWhDZ+5p640ALPs0A2LIacaVYyWM8WQLNk8tU58Jd6vXjxcTTNQNebdooH8aHz1teFt5RMHKGcRPfMXaGCKfCoKl1mRmfANdP0cGMQRdApuHHmct7ZbBRCSk9MVkhQzCNur/W4HCI94FtJBd8ACWZ5wGtJdf4T40dOc+0twg8O7O9m5u4VG/A9Ytk3GY5uRwVxNce03Pqtf9Ama0j9NM07/oILjAqWnykAABtmerAPxyBnEe3J7gY/SITbhVmzyncrZhqoDwayiMoDeZcK0y0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BhjHWRNrYFNGg7GSevSOjoVhibu2wRer0P/CHRCgIaU=; b=OEB8U5jHWJYubw0m1FfqEAyVJcAENTLMXDtmdjPeHIVKL/h0N3S9BGXlo2EhldpFYXP23zmchuxHDMdHnYhhSoQUpaRWWJm97p7430SIB5VIS7oNQ05ktxwXUd43iw0raz8FCh3No7GB5NV4M9f5tNRc3Q0xdsshjbLTajf1jOo9R3/y4ubJxOc8FfwmIOogbD4w21uVpFmoVIt3CR4PVqtwOvh6T45Bq4+nZON+Wvg8FsXlmbcfmsPgVx3BmLl3TBdyiFnqjBuJQFB9LBlBPgML3MGqHB8Uo+kxKVRTwOf2i5eo1wiX9LdlPGGbgMEM5nYAfx0sbrYZ6+AX9YNgNg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BhjHWRNrYFNGg7GSevSOjoVhibu2wRer0P/CHRCgIaU=; b=Dec3MwmBcGcsM0RYyPTuvjLbWacm7tEm27UiWxpGcSNIfis+cY0QSDjcX2e7ije/zdHNQbTg2z3bP9ovHvZ2neK0ceLqAt1mHtloGGgj/aJPZMb3HZqV0WELq4Q5AZHL+Jm4GivhTEYKcGBW+5QNjWsQV5bM88kgZA9hPr+m9ToKbNBpYDolI8cnLaPzV2a4u5FJP9bWakaXI49IWmvqIgj2HRn21GyvO+ULHLdDB4zupvZ21rb7rwjl8VSWD0fZvCj1gYoCnnIepQXI3Z/W4qJXNLmF2VssngA6MpWLA9P3eeHD8upyJbRFX58JQq9/csBEXCmzFjQnRcBt6yqTfg== Received: from BN6PR13CA0033.namprd13.prod.outlook.com (2603:10b6:404:13e::19) by BY5PR12MB4643.namprd12.prod.outlook.com (2603:10b6:a03:1ff::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.18; Thu, 11 Mar 2021 18:04:25 +0000 Received: from BN8NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:404:13e:cafe::af) by BN6PR13CA0033.outlook.office365.com (2603:10b6:404:13e::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3955.10 via Frontend Transport; Thu, 11 Mar 2021 18:04:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT060.mail.protection.outlook.com (10.13.177.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:24 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:15 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 08/14] nexthop: Allow setting "offload" and "trap" indication of nexthop buckets Date: Thu, 11 Mar 2021 19:03:19 +0100 Message-ID: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: b4c641e3-7572-4f45-81d1-08d8e4b81aff X-MS-TrafficTypeDiagnostic: BY5PR12MB4643: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:3383; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 08lEaxaKiFq43PZPaP1OlCBgHyb0oXkBHKFMOBHeKChjf7PuYAColC0KL66r1yCUiGF1TNv7IoL0PlAGnSuNlXAoxkb2iySbvas0MkGwCMfL+ngZ6PiCRCSlp6iUpxUbnZwEv8sASb17kBKwIzqOjbPjfArnKncImh4bMC1SOU8ahzda7d+e0ZU3UiJM25i0H8irZIINY+LLh3WCtkBH9AgOvQ3Z2XardilzkF0v5rIxG4ODn+uMnC3k9nGUFYjUhC0U6F/ufaZT7iap7xCzIt6yPjJeXjvRTlp7jtkhnCxYEqi8bDT85s5VY7Yqw7bBPRFQsEiFJaor/ZO0SAQ7mQZASa1TVIrdUkREuAjOUhar5844eQOtV2orOnnzAF5ca1gVbNdjEW7H7naanaIF//zLsyZ9LS+ZYMUU6awAbAboAkrswOsMjwT+oebPPe3LLt6Z/k2J81yaPfCvGfRX+IzFf+rgSSjBd38skc5rz5B43ekUEzSMabADb4ZN+Tgwyfqi2Ub8rAnSaVo4UiKCbGo2qs9xBK6lwox578olEOSJ2bwuyD5wxNM980hrd6gUhQWE18ZEoIRNsSMm9z5DCAy2f5ciSEeBKQP+mmOzMNPmy2AK8SuNVKSekKDg7kXukswUdSG+ItOm9YKy5s3cCGjGuqW2Nu6rtXOzdA8f+sLhvxDBhuPgkRcifwQchZAE X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(136003)(39860400002)(346002)(376002)(396003)(46966006)(36840700001)(34020700004)(5660300002)(82310400003)(4326008)(16526019)(47076005)(478600001)(2616005)(316002)(7636003)(336012)(8936002)(186003)(6916009)(83380400001)(2906002)(82740400003)(26005)(36906005)(426003)(36756003)(8676002)(70586007)(86362001)(356005)(36860700001)(54906003)(107886003)(70206006); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:24.9314 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b4c641e3-7572-4f45-81d1-08d8e4b81aff X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4643 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ido Schimmel Add a function that can be called by device drivers to set "offload" or "trap" indication on nexthop buckets following nexthop notifications and other changes such as a neighbour becoming invalid. Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata Reviewed-by: David Ahern Signed-off-by: Petr Machata --- Notes: v1 (changes since RFC): - u32 -> u16 for bucket counts / indices include/net/nexthop.h | 2 ++ net/ipv4/nexthop.c | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/include/net/nexthop.h b/include/net/nexthop.h index fd3c0debe8bf..685f208d26b5 100644 --- a/include/net/nexthop.h +++ b/include/net/nexthop.h @@ -220,6 +220,8 @@ int register_nexthop_notifier(struct net *net, struct notifier_block *nb, struct netlink_ext_ack *extack); int unregister_nexthop_notifier(struct net *net, struct notifier_block *nb); void nexthop_set_hw_flags(struct net *net, u32 id, bool offload, bool trap); +void nexthop_bucket_set_hw_flags(struct net *net, u32 id, u16 bucket_index, + bool offload, bool trap); /* caller is holding rcu or rtnl; no reference taken to nexthop */ struct nexthop *nexthop_find_by_id(struct net *net, u32 id); diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 8b06aafc2e9e..1fce4ff39390 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -3072,6 +3072,40 @@ void nexthop_set_hw_flags(struct net *net, u32 id, bool offload, bool trap) } EXPORT_SYMBOL(nexthop_set_hw_flags); +void nexthop_bucket_set_hw_flags(struct net *net, u32 id, u16 bucket_index, + bool offload, bool trap) +{ + struct nh_res_table *res_table; + struct nh_res_bucket *bucket; + struct nexthop *nexthop; + struct nh_group *nhg; + + rcu_read_lock(); + + nexthop = nexthop_find_by_id(net, id); + if (!nexthop || !nexthop->is_group) + goto out; + + nhg = rcu_dereference(nexthop->nh_grp); + if (!nhg->resilient) + goto out; + + if (bucket_index >= nhg->res_table->num_nh_buckets) + goto out; + + res_table = rcu_dereference(nhg->res_table); + bucket = &res_table->nh_buckets[bucket_index]; + bucket->nh_flags &= ~(RTNH_F_OFFLOAD | RTNH_F_TRAP); + if (offload) + bucket->nh_flags |= RTNH_F_OFFLOAD; + if (trap) + bucket->nh_flags |= RTNH_F_TRAP; + +out: + rcu_read_unlock(); +} +EXPORT_SYMBOL(nexthop_bucket_set_hw_flags); + static void __net_exit nexthop_net_exit(struct net *net) { rtnl_lock(); From patchwork Thu Mar 11 18:03:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 399477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26E75C43331 for ; Thu, 11 Mar 2021 18:05:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDBA164FEF for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230325AbhCKSEo (ORCPT ); Thu, 11 Mar 2021 13:04:44 -0500 Received: from mail-bn7nam10on2064.outbound.protection.outlook.com ([40.107.92.64]:32420 "EHLO NAM10-BN7-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229965AbhCKSE2 (ORCPT ); Thu, 11 Mar 2021 13:04:28 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EtcQH15eVIXuMAxyXNjmNMZk3UK4U8IMnEGgbkdJQj0owvEbrFy5+tnlWTWa6I7DMrl1wDJfIrbJ1fGfGgWkytd4eYF/8QqcmRvrF4cu3NbF60/8T25rRFmvFwxhmmS6M0nCZvcHia6mAnfKh/NRi8B8+0F5Dz7MdFwpprX3R9o3K8mdRDWhYsLbjfnt1z1BMJVKBZVpCGNqTYe5OPoDNWz0YRRBeRxyxLQsK4utNwWTcWcVqCfSa5V0m4G2QDTzFR1tHkMvH6Gg0sGnAZu67fZR9i2h4zszdVs+FybQbFFIEUbPVLmxtRUrp99y2/JYhwBdkJzjXJHaIpsDcpzi/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EFVIDRAaq+wngvGkg1P218mGgt2af8tM0GZ4jHsAg6M=; b=SXCYkXC2auaUqXWSFtIFN9nEkvNpt7+vzrUT7qDVxZk0lT+MNpmKiyxuext7eTCHnbZxGb4UU2giPLZvJJeIPY7Y3/I3v1W4jcKAa1kn2rKxF7qK7RWA/qveDH/W4D6ULgUVY4NWJlGlSlHq7tgA360J2C3in5K41kNBts63MUTb8rjcj874qi7V7ZVnMd30u8FTSSZ4WPEoo0/lwMwzpFxS7duNN2pBJO927Fy2qG+QoB8y7+b+OYuyutMR+nMpwr0WAyAj3L+/55oTX3JIrPBnfdFUkgmCu7O5UyQWiO5M2ydXnxWyChGyzqJG5U3TMlvZDCeFJTZroQP/WMGLIA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EFVIDRAaq+wngvGkg1P218mGgt2af8tM0GZ4jHsAg6M=; b=SoHjqxTSZEhF9sNF2YWHff7C4sFzQzr/AMqa/BOCeB3sjC4lqbumbINsFUDoCfkr18a5yC8hPgfLFXJajC0rVzfbsGOqbQqPBa1CBqXgRse3wkn+RWplZNPhcXZ0XYfLu8wwejhC8zWHwBnxpIu5nQkPW1H6GFpChPEqb+qtjVav0xW9OnVziVCrES4IzEP6XioggpUQjLA1BCtUHYxcB1cOHmDlyRxOhxKiPooeX12j+6T6jFO5GRUfcIgrxIA5vbg3wxhRtvWYq6BaiPTNkjg0WAkNROMVEEpQPcMC1Oij9mIJEixvLx2io8zRavbPb6DlPaG7LnbJkIjXB32L7Q== Received: from BN6PR13CA0047.namprd13.prod.outlook.com (2603:10b6:404:13e::33) by CH2PR12MB3895.namprd12.prod.outlook.com (2603:10b6:610:2a::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.26; Thu, 11 Mar 2021 18:04:26 +0000 Received: from BN8NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:404:13e:cafe::28) by BN6PR13CA0047.outlook.office365.com (2603:10b6:404:13e::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3955.11 via Frontend Transport; Thu, 11 Mar 2021 18:04:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT060.mail.protection.outlook.com (10.13.177.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:26 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:18 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 09/14] nexthop: Allow reporting activity of nexthop buckets Date: Thu, 11 Mar 2021 19:03:20 +0100 Message-ID: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 14efb3c2-b5f1-48cc-baab-08d8e4b81bc0 X-MS-TrafficTypeDiagnostic: CH2PR12MB3895: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:7219; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: /mJgpwmz9vFkKiureRF53lmo4nUVC3iqwhXHhacDQt9JjyZSKqVI8ZXTyX1F6mNEmLb5VfJ3DMofH9tOYoTZ+CG87VOo+s51wZ2IQSKRb9BJ+afsBRTsE98cixLivlj76phsWSK+K379/VnI6RboO/LDb1QtH/rr4Of3ZdLmH7xvmlGQ2FaakmjcbryhuwjoSr9ju5wvf/cYSrk5HgRJmY8NI6T1m5PZjbzLv/eqhwmxVWqIXyCI41F7rfPEha+uoxuEo20ulWvQeOqAskch4F4YKp9nznXtB4eO8lSbKO1wJBKdrPbCt5ZDzgKxyKHH2vXmf3jYXRLOqoUKZ7CyZElssrIYRcsZa2dhvsxDIzlSSth+txloygtmpCp+KCZBlbxMyfMA1QVff6dEnjupaTZEvOXshruXhrbC9oWFz5ufZ2HuZsa5wMGPn3BNx+mhdzuWnKg4viC5rVcYeW9RkluCqWHl/lb3i1z4HUn8AVkC6C/62y01WdDBZALnoYGAHnXy8XydCzFdXV4BXJ22K7WyHnAwaa5NDg2seMHkT0Fgl49Q9KJV5JWcRf0LPy7hiCFD9vZJxtaOj4E4dZYgvu21qiqbUqeF1JoWFohKi1Avi6F7ezwqktBT9RN0IZufHw4aJHE3ZgurMjxZ2931FBmFF1ZqIj6ElET+7i5tvhfTD+KP6eMaX7SVYTFoPKcE X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(346002)(136003)(376002)(39860400002)(396003)(36840700001)(46966006)(83380400001)(2906002)(36860700001)(34020700004)(478600001)(8676002)(36756003)(356005)(82310400003)(5660300002)(7636003)(186003)(70586007)(16526019)(82740400003)(86362001)(6666004)(8936002)(47076005)(316002)(70206006)(36906005)(336012)(6916009)(4326008)(54906003)(426003)(26005)(107886003)(2616005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:26.1667 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 14efb3c2-b5f1-48cc-baab-08d8e4b81bc0 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB3895 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ido Schimmel The kernel periodically checks the idle time of nexthop buckets to determine if they are idle and can be re-populated with a new nexthop. When the resilient nexthop group is offloaded to hardware, the kernel will not see activity on nexthop buckets unless it is reported from hardware. Add a function that can be periodically called by device drivers to report activity on nexthop buckets after querying it from the underlying device. Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata Reviewed-by: David Ahern Signed-off-by: Petr Machata --- Notes: v1 (changes since RFC): - u32 -> u16 for bucket counts / indices include/net/nexthop.h | 2 ++ net/ipv4/nexthop.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/include/net/nexthop.h b/include/net/nexthop.h index 685f208d26b5..ba94868a21d5 100644 --- a/include/net/nexthop.h +++ b/include/net/nexthop.h @@ -222,6 +222,8 @@ int unregister_nexthop_notifier(struct net *net, struct notifier_block *nb); void nexthop_set_hw_flags(struct net *net, u32 id, bool offload, bool trap); void nexthop_bucket_set_hw_flags(struct net *net, u32 id, u16 bucket_index, bool offload, bool trap); +void nexthop_res_grp_activity_update(struct net *net, u32 id, u16 num_buckets, + unsigned long *activity); /* caller is holding rcu or rtnl; no reference taken to nexthop */ struct nexthop *nexthop_find_by_id(struct net *net, u32 id); diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 1fce4ff39390..495b5e69ffcd 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -3106,6 +3106,41 @@ void nexthop_bucket_set_hw_flags(struct net *net, u32 id, u16 bucket_index, } EXPORT_SYMBOL(nexthop_bucket_set_hw_flags); +void nexthop_res_grp_activity_update(struct net *net, u32 id, u16 num_buckets, + unsigned long *activity) +{ + struct nh_res_table *res_table; + struct nexthop *nexthop; + struct nh_group *nhg; + u16 i; + + rcu_read_lock(); + + nexthop = nexthop_find_by_id(net, id); + if (!nexthop || !nexthop->is_group) + goto out; + + nhg = rcu_dereference(nexthop->nh_grp); + if (!nhg->resilient) + goto out; + + /* Instead of silently ignoring some buckets, demand that the sizes + * be the same. + */ + res_table = rcu_dereference(nhg->res_table); + if (num_buckets != res_table->num_nh_buckets) + goto out; + + for (i = 0; i < num_buckets; i++) { + if (test_bit(i, activity)) + nh_res_bucket_set_busy(&res_table->nh_buckets[i]); + } + +out: + rcu_read_unlock(); +} +EXPORT_SYMBOL(nexthop_res_grp_activity_update); + static void __net_exit nexthop_net_exit(struct net *net) { rtnl_lock(); From patchwork Thu Mar 11 18:03:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 398178 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55AA3C4332D for ; Thu, 11 Mar 2021 18:05:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 29D9E6500E for ; Thu, 11 Mar 2021 18:05:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230343AbhCKSEp (ORCPT ); Thu, 11 Mar 2021 13:04:45 -0500 Received: from mail-eopbgr760088.outbound.protection.outlook.com ([40.107.76.88]:37186 "EHLO NAM02-CY1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230027AbhCKSE3 (ORCPT ); Thu, 11 Mar 2021 13:04:29 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CFC5T0Y4EW2TNLUVHZUhGhqxFKkwzr80UwXsI9mRdgs4kK+S6R+FbYDzeKtCy+2PhGEAupREHFbSrmMHiBak1YNK7PUUGGlfdpJHcnkH7jL61HZ0XYpiwcEdAtrJaKQJTiIqOcHOU2oc4SIahhBaZLzx9f8o3vwQRQ6hdv15BoU2RncExLOLKTGjYIEaRz85kGB7dint48uQp1nfzH/5G5N3zKuxi+R9ikLf2wzIpHvDajmLaLbKdpGeLQpY6ha/bPsTgBjWfUUskrqvBhdHMLoaTqNvaeKy4h+98DyTnUv4ZF51MkcVvNUJAGqbVGzR4bTSW+X2pKvw+rGUkTnavQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+iEIhWv93XkRMZcREWmCT9yJKfTxzZlo+la4U1XUcGM=; b=JpbVtERqqWicHM7lbqM7iKsQNmEdChSkfq++H2QKU66LQnaedrgW4x+QG/ifizHnKuraFtIEepZXOqUqh1wWwlsP78F9Pnr3yidTzP3hCiejXVs+KIQWWK21yQG/oWAo2t27MifExMMTtkopWrMOlFEM6QpjoZ+VCAvPPeibE98RFRILjc4b1dh5QL3llD9u43hHmr8rjJ5IM2Zq3RxFWpmG8eHC8lfyx8+GGds3em2rnM49hJldJCF+xDgH3dBBk2v7Yfr8CC/1Kk05j4Wr6ghAY/DgkhbwbEl5/yUYJefxkt6zHrIlZPtME4CVmOD/FVqh+OeMrZBP1EJUeBiXlg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+iEIhWv93XkRMZcREWmCT9yJKfTxzZlo+la4U1XUcGM=; b=nqcCavm5NJRtI9XsysBTLdjgkKEY3w/h35Ke3n61sFonKG2qj41SFbriHAgIJhmFIZaSd8nrsXJ9vZ+FrqW5XQTo81Lu94YtNKIsfp6Q4upC0+Vykvn64ZXvm+ozNcEYF5wOVGJcPl+X3FIc2ZD0spmlImoxkX9eGFCrpBNR8rlNOKCfg8V3CbcxI7ctnN5km7MJbEcHw565UjShEMVjR1i79OUoaiJEyQHWJdHhZMHuD92HrJFGubPz5ITj5uLj5D3g8LWy9GCr7Wu1O8WGGCIa1an6P3ThLRmNeoO2iKFDlo1LQ9hC602a50E6CSW6uIDpCSZteT/3JzzkvNhyqg== Received: from BN6PR13CA0046.namprd13.prod.outlook.com (2603:10b6:404:13e::32) by BY5PR12MB4966.namprd12.prod.outlook.com (2603:10b6:a03:1da::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17; Thu, 11 Mar 2021 18:04:27 +0000 Received: from BN8NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:404:13e:cafe::d3) by BN6PR13CA0046.outlook.office365.com (2603:10b6:404:13e::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3955.11 via Frontend Transport; Thu, 11 Mar 2021 18:04:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT060.mail.protection.outlook.com (10.13.177.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:26 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:21 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 10/14] nexthop: Add netlink handlers for resilient nexthop groups Date: Thu, 11 Mar 2021 19:03:21 +0100 Message-ID: <85c90b43f567a62d40bdf58465e174d774e4be4f.1615485052.git.petrm@nvidia.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c70c380e-3ff5-4e8e-96b4-08d8e4b81c16 X-MS-TrafficTypeDiagnostic: BY5PR12MB4966: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2512; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: saj/Jmko2jhVTyc2hRKHsYK3bq3LXsbJbTG3C09rgAp09KgH8LXqAAknANp9zTJHdgTlVnhgrXYIKW1anDA/wOq7MVM84LUMW4+KK008KKX2QpY1P3PEhFq7apkkKgqcdINvlw6Lk4xqhYlwiX9reWFQIrr+2i/NtogewTgMW7sXcvdpz+3k6Pu9v8JfrHq9AA2IgCRz5XpRQH5sNKiICQOxLUpxn4j5w5KB13V6I47ji2RL2gPEqyzDoN8AjBPPPAO4++WCqVqWb/xBl1UsZUud2lp02PInFLqGjNRA+8cvawMH/dYLQSSFs6NFR2zYwYG+p4j2MqE9axRpm+Fg9x5925deKgh3wAyY5WNAi5FGctM2TVvCm3M8w3yFuhCjNKCVHJmW017cymdpEYmS8OdFlMnLgDaQd5ednJeSygBIgfBRM/CJlOqHVltTxKmx+YQR98ioLLVse/PKBfaLtMfwHAex8nnJMln4NHArFYbnY52QPRBXI7xJTgYFQihhI7RVrrwqu/S6jykJwQQOLTuHRYnCvuY5ukPpOOoDsECIEQzGdaAR6c+eTGDMpMTPQ5CZ+xZYwARrxjeiIQI5+JtPfp4e/UkrXLp32DElYbVEC16C7a8ymZvGrQM9qTo6AlJsb7gEe92odl/el2u3jnLydoX5y91DZVOqSVSODd7ETNjVIOwekPZ0+8MvQ8rd X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(376002)(396003)(346002)(136003)(39860400002)(46966006)(36840700001)(83380400001)(82740400003)(70586007)(36906005)(82310400003)(2616005)(36756003)(7636003)(70206006)(316002)(356005)(34020700004)(6916009)(36860700001)(478600001)(8936002)(16526019)(5660300002)(186003)(2906002)(426003)(54906003)(8676002)(336012)(26005)(107886003)(4326008)(86362001)(47076005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:26.7673 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c70c380e-3ff5-4e8e-96b4-08d8e4b81c16 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4966 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Implement the netlink messages that allow creation and dumping of resilient nexthop groups. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: David Ahern --- Notes: v1 (changes since RFC): - u32 -> u16 for bucket counts / indices net/ipv4/nexthop.c | 150 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 145 insertions(+), 5 deletions(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 495b5e69ffcd..439bf3b7ced5 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -16,6 +16,9 @@ #include #include +#define NH_RES_DEFAULT_IDLE_TIMER (120 * HZ) +#define NH_RES_DEFAULT_UNBALANCED_TIMER 0 /* No forced rebalancing. */ + static void remove_nexthop(struct net *net, struct nexthop *nh, struct nl_info *nlinfo); @@ -32,6 +35,7 @@ static const struct nla_policy rtm_nh_policy_new[] = { [NHA_ENCAP_TYPE] = { .type = NLA_U16 }, [NHA_ENCAP] = { .type = NLA_NESTED }, [NHA_FDB] = { .type = NLA_FLAG }, + [NHA_RES_GROUP] = { .type = NLA_NESTED }, }; static const struct nla_policy rtm_nh_policy_get[] = { @@ -45,6 +49,12 @@ static const struct nla_policy rtm_nh_policy_dump[] = { [NHA_FDB] = { .type = NLA_FLAG }, }; +static const struct nla_policy rtm_nh_res_policy_new[] = { + [NHA_RES_GROUP_BUCKETS] = { .type = NLA_U16 }, + [NHA_RES_GROUP_IDLE_TIMER] = { .type = NLA_U32 }, + [NHA_RES_GROUP_UNBALANCED_TIMER] = { .type = NLA_U32 }, +}; + static bool nexthop_notifiers_is_empty(struct net *net) { return !net->nexthop.notifier_chain.head; @@ -588,6 +598,41 @@ static void nh_res_time_set_deadline(unsigned long next_time, *deadline = next_time; } +static clock_t nh_res_table_unbalanced_time(struct nh_res_table *res_table) +{ + if (list_empty(&res_table->uw_nh_entries)) + return 0; + return jiffies_delta_to_clock_t(jiffies - res_table->unbalanced_since); +} + +static int nla_put_nh_group_res(struct sk_buff *skb, struct nh_group *nhg) +{ + struct nh_res_table *res_table = rtnl_dereference(nhg->res_table); + struct nlattr *nest; + + nest = nla_nest_start(skb, NHA_RES_GROUP); + if (!nest) + return -EMSGSIZE; + + if (nla_put_u16(skb, NHA_RES_GROUP_BUCKETS, + res_table->num_nh_buckets) || + nla_put_u32(skb, NHA_RES_GROUP_IDLE_TIMER, + jiffies_to_clock_t(res_table->idle_timer)) || + nla_put_u32(skb, NHA_RES_GROUP_UNBALANCED_TIMER, + jiffies_to_clock_t(res_table->unbalanced_timer)) || + nla_put_u64_64bit(skb, NHA_RES_GROUP_UNBALANCED_TIME, + nh_res_table_unbalanced_time(res_table), + NHA_RES_GROUP_PAD)) + goto nla_put_failure; + + nla_nest_end(skb, nest); + return 0; + +nla_put_failure: + nla_nest_cancel(skb, nest); + return -EMSGSIZE; +} + static int nla_put_nh_group(struct sk_buff *skb, struct nh_group *nhg) { struct nexthop_grp *p; @@ -598,6 +643,8 @@ static int nla_put_nh_group(struct sk_buff *skb, struct nh_group *nhg) if (nhg->mpath) group_type = NEXTHOP_GRP_TYPE_MPATH; + else if (nhg->resilient) + group_type = NEXTHOP_GRP_TYPE_RES; if (nla_put_u16(skb, NHA_GROUP_TYPE, group_type)) goto nla_put_failure; @@ -613,6 +660,9 @@ static int nla_put_nh_group(struct sk_buff *skb, struct nh_group *nhg) p += 1; } + if (nhg->resilient && nla_put_nh_group_res(skb, nhg)) + goto nla_put_failure; + return 0; nla_put_failure: @@ -700,13 +750,26 @@ static int nh_fill_node(struct sk_buff *skb, struct nexthop *nh, return -EMSGSIZE; } +static size_t nh_nlmsg_size_grp_res(struct nh_group *nhg) +{ + return nla_total_size(0) + /* NHA_RES_GROUP */ + nla_total_size(2) + /* NHA_RES_GROUP_BUCKETS */ + nla_total_size(4) + /* NHA_RES_GROUP_IDLE_TIMER */ + nla_total_size(4) + /* NHA_RES_GROUP_UNBALANCED_TIMER */ + nla_total_size_64bit(8);/* NHA_RES_GROUP_UNBALANCED_TIME */ +} + static size_t nh_nlmsg_size_grp(struct nexthop *nh) { struct nh_group *nhg = rtnl_dereference(nh->nh_grp); size_t sz = sizeof(struct nexthop_grp) * nhg->num_nh; + size_t tot = nla_total_size(sz) + + nla_total_size(2); /* NHA_GROUP_TYPE */ + + if (nhg->resilient) + tot += nh_nlmsg_size_grp_res(nhg); - return nla_total_size(sz) + - nla_total_size(2); /* NHA_GROUP_TYPE */ + return tot; } static size_t nh_nlmsg_size_single(struct nexthop *nh) @@ -876,7 +939,7 @@ static int nh_check_attr_fdb_group(struct nexthop *nh, u8 *nh_family, static int nh_check_attr_group(struct net *net, struct nlattr *tb[], size_t tb_size, - struct netlink_ext_ack *extack) + u16 nh_grp_type, struct netlink_ext_ack *extack) { unsigned int len = nla_len(tb[NHA_GROUP]); u8 nh_family = AF_UNSPEC; @@ -937,8 +1000,14 @@ static int nh_check_attr_group(struct net *net, for (i = NHA_GROUP_TYPE + 1; i < tb_size; ++i) { if (!tb[i]) continue; - if (i == NHA_FDB) + switch (i) { + case NHA_FDB: continue; + case NHA_RES_GROUP: + if (nh_grp_type == NEXTHOP_GRP_TYPE_RES) + continue; + break; + } NL_SET_ERR_MSG(extack, "No other attributes can be set in nexthop groups"); return -EINVAL; @@ -2475,6 +2544,70 @@ static struct nexthop *nexthop_add(struct net *net, struct nh_config *cfg, return nh; } +static int rtm_nh_get_timer(struct nlattr *attr, unsigned long fallback, + unsigned long *timer_p, bool *has_p, + struct netlink_ext_ack *extack) +{ + unsigned long timer; + u32 value; + + if (!attr) { + *timer_p = fallback; + *has_p = false; + return 0; + } + + value = nla_get_u32(attr); + timer = clock_t_to_jiffies(value); + if (timer == ~0UL) { + NL_SET_ERR_MSG(extack, "Timer value too large"); + return -EINVAL; + } + + *timer_p = timer; + *has_p = true; + return 0; +} + +static int rtm_to_nh_config_grp_res(struct nlattr *res, struct nh_config *cfg, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[ARRAY_SIZE(rtm_nh_res_policy_new)] = {}; + int err; + + if (res) { + err = nla_parse_nested(tb, + ARRAY_SIZE(rtm_nh_res_policy_new) - 1, + res, rtm_nh_res_policy_new, extack); + if (err < 0) + return err; + } + + if (tb[NHA_RES_GROUP_BUCKETS]) { + cfg->nh_grp_res_num_buckets = + nla_get_u16(tb[NHA_RES_GROUP_BUCKETS]); + cfg->nh_grp_res_has_num_buckets = true; + if (!cfg->nh_grp_res_num_buckets) { + NL_SET_ERR_MSG(extack, "Number of buckets needs to be non-0"); + return -EINVAL; + } + } + + err = rtm_nh_get_timer(tb[NHA_RES_GROUP_IDLE_TIMER], + NH_RES_DEFAULT_IDLE_TIMER, + &cfg->nh_grp_res_idle_timer, + &cfg->nh_grp_res_has_idle_timer, + extack); + if (err) + return err; + + return rtm_nh_get_timer(tb[NHA_RES_GROUP_UNBALANCED_TIMER], + NH_RES_DEFAULT_UNBALANCED_TIMER, + &cfg->nh_grp_res_unbalanced_timer, + &cfg->nh_grp_res_has_unbalanced_timer, + extack); +} + static int rtm_to_nh_config(struct net *net, struct sk_buff *skb, struct nlmsghdr *nlh, struct nh_config *cfg, struct netlink_ext_ack *extack) @@ -2553,7 +2686,14 @@ static int rtm_to_nh_config(struct net *net, struct sk_buff *skb, NL_SET_ERR_MSG(extack, "Invalid group type"); goto out; } - err = nh_check_attr_group(net, tb, ARRAY_SIZE(tb), extack); + err = nh_check_attr_group(net, tb, ARRAY_SIZE(tb), + cfg->nh_grp_type, extack); + if (err) + goto out; + + if (cfg->nh_grp_type == NEXTHOP_GRP_TYPE_RES) + err = rtm_to_nh_config_grp_res(tb[NHA_RES_GROUP], + cfg, extack); /* no other attributes should be set */ goto out; From patchwork Thu Mar 11 18:03:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 398180 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35E8EC43332 for ; Thu, 11 Mar 2021 18:05:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 070216500D for ; Thu, 11 Mar 2021 18:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230359AbhCKSEq (ORCPT ); Thu, 11 Mar 2021 13:04:46 -0500 Received: from mail-bn7nam10on2074.outbound.protection.outlook.com ([40.107.92.74]:19248 "EHLO NAM10-BN7-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230033AbhCKSEd (ORCPT ); Thu, 11 Mar 2021 13:04:33 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IffTfRmB64DdphhNiI6TcotTw/v9AsCWSGgB4tJvbcDGz3WNj2Jgig07w5fzbQMZu5145+jD4vqSPlMgGgaaDcIxBb2ThC3x2T8VB7txlai40YDX1FGBdoDzPpg6xUaf+TqJR/SJ1IQoFiy6hnrXp20AnDcxeeKOdyv8CmnlHd6Q6Lk0YrnRe7qLveCY8kExCSFzk8ZLUHyXElY1oUgGZXxFQkPunNAZFSo7VrGXS2kRhalKgy/4Qm7pW6ZYTwdLg+D6UOM2Cy/Bra4IZQLK0G3O+IdMegmTa+lf1Jai+b7ERLXk8CJzSj2i6rsEzlIva2R5/lIphvyBeelLX0FX7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ACTE62HcPbLeqVNb159l+/cBIZeNtkeZqfJRtA7kkLA=; b=MsC+UTvV64H4PCjgzL3DKGXHVCRSa2u/x702yrzHvoIidOZFQq3VK85v/BZzKbkyMHST+yBKTGuTv7f8zvGkNbX0NVR3QZTUJwNMeYYLaTm+L6vNKsU3q3moc6VwoumeG2/rgg3kGRftGTqk50I2NdiA9mTHHI+BJ1571hR+AUqHxu+8OKVu/KgO+E2CO450oZBwBMQ8hJFVFs7WTijK22IrjQnCwmbee7vffAw4Pebj3vkQ1JyB/pa6IpT1X3xAzIaPuVrcmLwgi9aCeGn3fgx96REoiLFPzh8qv+1/guXuAdeICpw0hn+JtQcAkgrwDiV5GYkvPOk/c1Y5AJv04g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ACTE62HcPbLeqVNb159l+/cBIZeNtkeZqfJRtA7kkLA=; b=BvTHR2d7X6hkNtvdDBkFTPYSqOURJ59Gse1LC5A1n9HS8H9Ei7sH/mCeK/5/BSRan3vZuDgIVARq6S1vN3Vz7cm2H5YLkNgcKJRdqg86um1gvPxxisJCNcDgKaN2VldQh2RUtcrY+EXdVbVQh59XKLWrHOQYrTSRwNHAU30nPVbgoEVsZvDvzPT4mrrxXqonPpFLr6jUq+Ek6Yid61qGRnsaKAPYdDNls3ntpfYY2gks5G21rfb3bX0nTMjJpASp7LjlnXYY3pnVrb/76MS3CUX6DIxWx+5uiol+uYK5RpGdWU+/4urGavAuB5FmqrmX04C9RkHmpOqUZFxdJjqGEg== Received: from BN6PR13CA0037.namprd13.prod.outlook.com (2603:10b6:404:13e::23) by BN9PR12MB5241.namprd12.prod.outlook.com (2603:10b6:408:11e::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.30; Thu, 11 Mar 2021 18:04:29 +0000 Received: from BN8NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:404:13e:cafe::43) by BN6PR13CA0037.outlook.office365.com (2603:10b6:404:13e::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3955.10 via Frontend Transport; Thu, 11 Mar 2021 18:04:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT060.mail.protection.outlook.com (10.13.177.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:28 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:24 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 11/14] nexthop: Add netlink handlers for bucket dump Date: Thu, 11 Mar 2021 19:03:22 +0100 Message-ID: <52363e987d4412512cfef60114953c0af9f02085.1615485052.git.petrm@nvidia.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3235fdaf-829f-4b00-79da-08d8e4b81d4c X-MS-TrafficTypeDiagnostic: BN9PR12MB5241: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2089; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: K7eBNKXsTQGN5b90rPrkCrYkgWgkg4i4rUwVvzM8krWyhhtQSwpJdjww/AV25NY6IX3o8C8GC9wxaI7AcgmgLqbrUPpSZrigL/99sZcbPpefg7QNXJYk4ttXmjerAOSbBMOF8T6c4+IVGqmbtqB4FSM9wi/e8lGIyHi2jELa2hCdlh6W8J/mB88Wjb9skWXdKDO8+7iLPd9SYeA/5PM7D0Eh+/sdCcR9eas2QXajjjR//OopN/vvbhkT1a0gYL3GhD+lfxXAoS2/go15RRWlXwUeN1dPKi8t6A69xOF/ZVBSGZk/kZnphvhxXN1PT7V3OTlhN3OofonbkpsgO0I1+Ih2jJ7AksrnAAU6ZQYKzDtI5mGHOf/aZupQzhTLtNc1Nxcy6wEIlQwyJ7Q6/cfYNFLeWCbBaTWtY/6IDR8RNJjeJq9HquUH2ahAfAS1DQXZeVI+ipGfqyd4yle+CBe4GeTORcdnmY4gRqxs4TxL91ymvrb9YveifkHqxOh1xDO37OYQaEuu1x4Fa6x8NCx/zou37b691ggv7CKf20SNrZbOwxBGbIp6U1qAQyqdnB/Hg8QzD12wACbcdojFFmfKsGC9k7yPv6eKbKhZSUSJSWrvYIhM1xmy/0YFM1H7ICOPAM6qfWYqkJ4Rtp2a3kBeYIMZiB9WgUj2aYzFZ8p0sX0ZI1P7AbMTAJZ4VnEgmzBf X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(396003)(346002)(376002)(136003)(39860400002)(46966006)(36840700001)(107886003)(70206006)(426003)(47076005)(70586007)(478600001)(316002)(36906005)(83380400001)(36860700001)(5660300002)(16526019)(26005)(54906003)(4326008)(186003)(2616005)(34020700004)(8936002)(82740400003)(2906002)(86362001)(356005)(8676002)(7636003)(82310400003)(336012)(6916009)(36756003); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:28.7782 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3235fdaf-829f-4b00-79da-08d8e4b81d4c X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN9PR12MB5241 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add a dump handler for resilient next hop buckets. When next-hop group ID is given, it walks buckets of that group, otherwise it walks buckets of all groups. It then dumps the buckets whose next hops match the given filtering criteria. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: David Ahern --- Notes: v1 (changes since RFC): - u32 -> u16 for bucket counts / indices net/ipv4/nexthop.c | 283 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 283 insertions(+) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 439bf3b7ced5..ed2745708f9d 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -55,6 +55,17 @@ static const struct nla_policy rtm_nh_res_policy_new[] = { [NHA_RES_GROUP_UNBALANCED_TIMER] = { .type = NLA_U32 }, }; +static const struct nla_policy rtm_nh_policy_dump_bucket[] = { + [NHA_ID] = { .type = NLA_U32 }, + [NHA_OIF] = { .type = NLA_U32 }, + [NHA_MASTER] = { .type = NLA_U32 }, + [NHA_RES_BUCKET] = { .type = NLA_NESTED }, +}; + +static const struct nla_policy rtm_nh_res_bucket_policy_dump[] = { + [NHA_RES_BUCKET_NH_ID] = { .type = NLA_U32 }, +}; + static bool nexthop_notifiers_is_empty(struct net *net) { return !net->nexthop.notifier_chain.head; @@ -883,6 +894,60 @@ static void nh_res_bucket_set_busy(struct nh_res_bucket *bucket) atomic_long_set(&bucket->used_time, (long)jiffies); } +static clock_t nh_res_bucket_idle_time(const struct nh_res_bucket *bucket) +{ + unsigned long used_time = nh_res_bucket_used_time(bucket); + + return jiffies_delta_to_clock_t(jiffies - used_time); +} + +static int nh_fill_res_bucket(struct sk_buff *skb, struct nexthop *nh, + struct nh_res_bucket *bucket, u16 bucket_index, + int event, u32 portid, u32 seq, + unsigned int nlflags, + struct netlink_ext_ack *extack) +{ + struct nh_grp_entry *nhge = nh_res_dereference(bucket->nh_entry); + struct nlmsghdr *nlh; + struct nlattr *nest; + struct nhmsg *nhm; + + nlh = nlmsg_put(skb, portid, seq, event, sizeof(*nhm), nlflags); + if (!nlh) + return -EMSGSIZE; + + nhm = nlmsg_data(nlh); + nhm->nh_family = AF_UNSPEC; + nhm->nh_flags = bucket->nh_flags; + nhm->nh_protocol = nh->protocol; + nhm->nh_scope = 0; + nhm->resvd = 0; + + if (nla_put_u32(skb, NHA_ID, nh->id)) + goto nla_put_failure; + + nest = nla_nest_start(skb, NHA_RES_BUCKET); + if (!nest) + goto nla_put_failure; + + if (nla_put_u16(skb, NHA_RES_BUCKET_INDEX, bucket_index) || + nla_put_u32(skb, NHA_RES_BUCKET_NH_ID, nhge->nh->id) || + nla_put_u64_64bit(skb, NHA_RES_BUCKET_IDLE_TIME, + nh_res_bucket_idle_time(bucket), + NHA_RES_BUCKET_PAD)) + goto nla_put_failure_nest; + + nla_nest_end(skb, nest); + nlmsg_end(skb, nlh); + return 0; + +nla_put_failure_nest: + nla_nest_cancel(skb, nest); +nla_put_failure: + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; +} + static bool valid_group_nh(struct nexthop *nh, unsigned int npaths, bool *is_fdb, struct netlink_ext_ack *extack) { @@ -2918,10 +2983,12 @@ static int rtm_get_nexthop(struct sk_buff *in_skb, struct nlmsghdr *nlh, } struct nh_dump_filter { + u32 nh_id; int dev_idx; int master_idx; bool group_filter; bool fdb_filter; + u32 res_bucket_nh_id; }; static bool nh_dump_filtered(struct nexthop *nh, @@ -3101,6 +3168,219 @@ static int rtm_dump_nexthop(struct sk_buff *skb, struct netlink_callback *cb) return err; } +static struct nexthop * +nexthop_find_group_resilient(struct net *net, u32 id, + struct netlink_ext_ack *extack) +{ + struct nh_group *nhg; + struct nexthop *nh; + + nh = nexthop_find_by_id(net, id); + if (!nh) + return ERR_PTR(-ENOENT); + + if (!nh->is_group) { + NL_SET_ERR_MSG(extack, "Not a nexthop group"); + return ERR_PTR(-EINVAL); + } + + nhg = rtnl_dereference(nh->nh_grp); + if (!nhg->resilient) { + NL_SET_ERR_MSG(extack, "Nexthop group not of type resilient"); + return ERR_PTR(-EINVAL); + } + + return nh; +} + +static int nh_valid_dump_nhid(struct nlattr *attr, u32 *nh_id_p, + struct netlink_ext_ack *extack) +{ + u32 idx; + + if (attr) { + idx = nla_get_u32(attr); + if (!idx) { + NL_SET_ERR_MSG(extack, "Invalid nexthop id"); + return -EINVAL; + } + *nh_id_p = idx; + } else { + *nh_id_p = 0; + } + + return 0; +} + +static int nh_valid_dump_bucket_req(const struct nlmsghdr *nlh, + struct nh_dump_filter *filter, + struct netlink_callback *cb) +{ + struct nlattr *res_tb[ARRAY_SIZE(rtm_nh_res_bucket_policy_dump)]; + struct nlattr *tb[ARRAY_SIZE(rtm_nh_policy_dump_bucket)]; + int err; + + err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb, + ARRAY_SIZE(rtm_nh_policy_dump_bucket) - 1, + rtm_nh_policy_dump_bucket, NULL); + if (err < 0) + return err; + + err = nh_valid_dump_nhid(tb[NHA_ID], &filter->nh_id, cb->extack); + if (err) + return err; + + if (tb[NHA_RES_BUCKET]) { + size_t max = ARRAY_SIZE(rtm_nh_res_bucket_policy_dump) - 1; + + err = nla_parse_nested(res_tb, max, + tb[NHA_RES_BUCKET], + rtm_nh_res_bucket_policy_dump, + cb->extack); + if (err < 0) + return err; + + err = nh_valid_dump_nhid(res_tb[NHA_RES_BUCKET_NH_ID], + &filter->res_bucket_nh_id, + cb->extack); + if (err) + return err; + } + + return __nh_valid_dump_req(nlh, tb, filter, cb->extack); +} + +struct rtm_dump_res_bucket_ctx { + struct rtm_dump_nh_ctx nh; + u16 bucket_index; + u32 done_nh_idx; /* 1 + the index of the last fully processed NH. */ +}; + +static struct rtm_dump_res_bucket_ctx * +rtm_dump_res_bucket_ctx(struct netlink_callback *cb) +{ + struct rtm_dump_res_bucket_ctx *ctx = (void *)cb->ctx; + + BUILD_BUG_ON(sizeof(*ctx) > sizeof(cb->ctx)); + return ctx; +} + +struct rtm_dump_nexthop_bucket_data { + struct rtm_dump_res_bucket_ctx *ctx; + struct nh_dump_filter filter; +}; + +static int rtm_dump_nexthop_bucket_nh(struct sk_buff *skb, + struct netlink_callback *cb, + struct nexthop *nh, + struct rtm_dump_nexthop_bucket_data *dd) +{ + u32 portid = NETLINK_CB(cb->skb).portid; + struct nhmsg *nhm = nlmsg_data(cb->nlh); + struct nh_res_table *res_table; + struct nh_group *nhg; + u16 bucket_index; + int err; + + if (dd->ctx->nh.idx < dd->ctx->done_nh_idx) + return 0; + + nhg = rtnl_dereference(nh->nh_grp); + res_table = rtnl_dereference(nhg->res_table); + for (bucket_index = dd->ctx->bucket_index; + bucket_index < res_table->num_nh_buckets; + bucket_index++) { + struct nh_res_bucket *bucket; + struct nh_grp_entry *nhge; + + bucket = &res_table->nh_buckets[bucket_index]; + nhge = rtnl_dereference(bucket->nh_entry); + if (nh_dump_filtered(nhge->nh, &dd->filter, nhm->nh_family)) + continue; + + if (dd->filter.res_bucket_nh_id && + dd->filter.res_bucket_nh_id != nhge->nh->id) + continue; + + err = nh_fill_res_bucket(skb, nh, bucket, bucket_index, + RTM_NEWNEXTHOPBUCKET, portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI, + cb->extack); + if (err < 0) { + if (likely(skb->len)) + goto out; + goto out_err; + } + } + + dd->ctx->done_nh_idx = dd->ctx->nh.idx + 1; + bucket_index = 0; + +out: + err = skb->len; +out_err: + dd->ctx->bucket_index = bucket_index; + return err; +} + +static int rtm_dump_nexthop_bucket_cb(struct sk_buff *skb, + struct netlink_callback *cb, + struct nexthop *nh, void *data) +{ + struct rtm_dump_nexthop_bucket_data *dd = data; + struct nh_group *nhg; + + if (!nh->is_group) + return 0; + + nhg = rtnl_dereference(nh->nh_grp); + if (!nhg->resilient) + return 0; + + return rtm_dump_nexthop_bucket_nh(skb, cb, nh, dd); +} + +/* rtnl */ +static int rtm_dump_nexthop_bucket(struct sk_buff *skb, + struct netlink_callback *cb) +{ + struct rtm_dump_res_bucket_ctx *ctx = rtm_dump_res_bucket_ctx(cb); + struct rtm_dump_nexthop_bucket_data dd = { .ctx = ctx }; + struct net *net = sock_net(skb->sk); + struct nexthop *nh; + int err; + + err = nh_valid_dump_bucket_req(cb->nlh, &dd.filter, cb); + if (err) + return err; + + if (dd.filter.nh_id) { + nh = nexthop_find_group_resilient(net, dd.filter.nh_id, + cb->extack); + if (IS_ERR(nh)) + return PTR_ERR(nh); + err = rtm_dump_nexthop_bucket_nh(skb, cb, nh, &dd); + } else { + struct rb_root *root = &net->nexthop.rb_root; + + err = rtm_dump_walk_nexthops(skb, cb, root, &ctx->nh, + &rtm_dump_nexthop_bucket_cb, &dd); + } + + if (err < 0) { + if (likely(skb->len)) + goto out; + goto out_err; + } + +out: + err = skb->len; +out_err: + cb->seq = net->nexthop.seq; + nl_dump_check_consistent(cb, nlmsg_hdr(skb)); + return err; +} + static void nexthop_sync_mtu(struct net_device *dev, u32 orig_mtu) { unsigned int hash = nh_dev_hashfn(dev->ifindex); @@ -3324,6 +3604,9 @@ static int __init nexthop_init(void) rtnl_register(PF_INET6, RTM_NEWNEXTHOP, rtm_new_nexthop, NULL, 0); rtnl_register(PF_INET6, RTM_GETNEXTHOP, NULL, rtm_dump_nexthop, 0); + rtnl_register(PF_UNSPEC, RTM_GETNEXTHOPBUCKET, NULL, + rtm_dump_nexthop_bucket, 0); + return 0; } subsys_initcall(nexthop_init); From patchwork Thu Mar 11 18:03:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 399476 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77BD7C432C3 for ; Thu, 11 Mar 2021 18:05:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5467C64F94 for ; Thu, 11 Mar 2021 18:05:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230373AbhCKSEr (ORCPT ); Thu, 11 Mar 2021 13:04:47 -0500 Received: from mail-dm6nam10on2049.outbound.protection.outlook.com ([40.107.93.49]:31773 "EHLO NAM10-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230031AbhCKSEd (ORCPT ); Thu, 11 Mar 2021 13:04:33 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=h4ev7hOO4esQAXgS3z6h4/wpNnxKDp8LFfH2fhESqhr3gVpDY7nnngn8NJOLlC8gqdWyfCHyJhjfWLcxagohhuWzPbfRZN4I5HLp4fbuUFqC6d1NxjnPV46DWoT4RCCeKhWNYcQWtZ+6c5eTAAJVn6a0AWAkYM5mgFqPOky53XYc//Pk7UHpLQqKwDPY+A2wDtlcVjqKMHzuDdz5X6SIOC1HuDaWT5kcr0m4ukGeoMpL6r298ddxAJHBan+vCXKwx4N64TYhhajJcb/tZ5BYCqNTkNkhI0ksipyKp2Ekih3np/DPnn1M9Jd2dhdorKkTZa5HH5Y81RhsqqJubwLBRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LTB4XOBcpDuVmE7KhNhMiGXsivMlwtsKJkLIc7Ic250=; b=W83HikA9WbKHJnnnAitErXsWOgjxT7Iik/j0VW4lJibRW7x2m2wqXHqjCDAIAZ/sJmRfABo3Cy0vEC5BvD+FV49Ye9IJHwjKmCRHyDmG4gbc9Laxg2SFI+5OiJ5oJugRdqwoCRdn54dWFb4Pke2B1czywBmBaq1nuJgp8j2orggR5eMoZc29f63cDGZOdf8FUSrlB6KTwWoy7UTnX+wP3tkxDrOtPuY3CeTNm57ufeJj6eJUVkNY3wytruyxcOKSpjmJw7+jEaiO95AgFNCG0XVAoDHRP/lMxaFlY0EWvzgFkYTQOohgCVbS/CSBcjLRPQasjo3VDHAsQZ8j6+8/aQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LTB4XOBcpDuVmE7KhNhMiGXsivMlwtsKJkLIc7Ic250=; b=kZx5VS3X0+QjOaj4MtT4dVVQhK4PkaUDdWc4rdkxiRLQ7fVrNMC6zQQfDRiEN3z6GQV8GIMFVpqw/W5z8ulEym/HaUgpfE9GZLE9FXvHA5A4Rhnk7cxZCY89Ghd1aRbH4RvE0g7RdO343+GEareigJGKvbRUTRd+cY4dnATwm+8skU/tTQ2+myySsl7ehEd89b19Buy1l5GOIkGKCqUxxjaeKZrIvvDXlYLJ83nqG9pPasehzqKc3JDoG4Bg66uYor+bAwZPQE13kC5IOzenOK+I+E2qRfL1gda9BpRKEINdt3SdelRdbqatG3k92njhSEGytwkPaM3GDg4JuS2gLQ== Received: from BN9PR03CA0259.namprd03.prod.outlook.com (2603:10b6:408:ff::24) by MN2PR12MB2864.namprd12.prod.outlook.com (2603:10b6:208:ac::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.30; Thu, 11 Mar 2021 18:04:30 +0000 Received: from BN8NAM11FT025.eop-nam11.prod.protection.outlook.com (2603:10b6:408:ff:cafe::d6) by BN9PR03CA0259.outlook.office365.com (2603:10b6:408:ff::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17 via Frontend Transport; Thu, 11 Mar 2021 18:04:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT025.mail.protection.outlook.com (10.13.177.136) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:30 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:27 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 12/14] nexthop: Add netlink handlers for bucket get Date: Thu, 11 Mar 2021 19:03:23 +0100 Message-ID: <67fbb9b09d35a3ae196e0f9c096652e8ec7413fa.1615485052.git.petrm@nvidia.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 920d4404-37e2-4a93-b341-08d8e4b81e43 X-MS-TrafficTypeDiagnostic: MN2PR12MB2864: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2733; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ywTuVrpyGVptChdoXeqRYawJcjRLXv6hT2baZ0oc6OA1fuuP0ZWsvhAit+fin4G8LeRhOzs1nQtuBJYeOLORcN+JpwNA8wGOnVPngag0JRofJB6rRRaiWaL6osedXZtyWyz5ZVwoOREWco2eu2hzkC5k7wKD1ibxeW6qRKMvUKLjD2Y0v2vr134ZMuIiVKHjk3+r6fdklp/lNlzvH9Qbxo6abpAdoIAx8ybKKpuy7+qyrtHGSWmf7b1syxsFhtv5B/NC7ETy/PX+wkG1jjRuEe2j0tDbAEPT+miniPMnmEkoB1uwk0Qo/j0rwkD/D3jY4H0VJRcnWsjp57PIPGGw7/eNRKVcv+L5RF4AQdGDegRgzsLLxDSClNXp8DirqGy52iXLr7ynP3mMFit1E3/pKnhoRNUnojA4v7bcUn2djzn3KJDDMO7lhrEfhcxLqWfweQRpxpn3jzlqOEvDz0XmRXGaqkNey+3lp5CKdJs5G+8hVxSpi0+vfZXiQxRtV9zii9gZCOKCo+7WPZewTvtYC1BK+K6ojOrz/Yo/+YekKmIqy2rrrK9XxqqTsQdoX24L4AGEUB7e/euEAoo+QXDoufZCuFJFwkS9hRDqcmrAQg+O21lK33FmdMUwZj7SlUfIwV0T2HWDUIQXMcs3uRe3ctpf4dDkt4AYXPtu1n2kbviRy5+DzlgtrZWfXf/OEYaU X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(39860400002)(376002)(346002)(136003)(396003)(46966006)(36840700001)(316002)(478600001)(36906005)(54906003)(336012)(5660300002)(426003)(8676002)(107886003)(36860700001)(36756003)(356005)(7636003)(6916009)(86362001)(83380400001)(47076005)(82740400003)(2616005)(34020700004)(6666004)(70586007)(70206006)(16526019)(82310400003)(26005)(4326008)(8936002)(2906002)(186003); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:30.3295 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 920d4404-37e2-4a93-b341-08d8e4b81e43 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT025.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB2864 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Allow getting (but not setting) individual buckets to inspect the next hop mapped therein, idle time, and flags. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: David Ahern --- Notes: v1 (changes since RFC): - u32 -> u16 for bucket counts / indices net/ipv4/nexthop.c | 110 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 109 insertions(+), 1 deletion(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index ed2745708f9d..3d602ef6f2c1 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -66,6 +66,15 @@ static const struct nla_policy rtm_nh_res_bucket_policy_dump[] = { [NHA_RES_BUCKET_NH_ID] = { .type = NLA_U32 }, }; +static const struct nla_policy rtm_nh_policy_get_bucket[] = { + [NHA_ID] = { .type = NLA_U32 }, + [NHA_RES_BUCKET] = { .type = NLA_NESTED }, +}; + +static const struct nla_policy rtm_nh_res_bucket_policy_get[] = { + [NHA_RES_BUCKET_INDEX] = { .type = NLA_U16 }, +}; + static bool nexthop_notifiers_is_empty(struct net *net) { return !net->nexthop.notifier_chain.head; @@ -3381,6 +3390,105 @@ static int rtm_dump_nexthop_bucket(struct sk_buff *skb, return err; } +static int nh_valid_get_bucket_req_res_bucket(struct nlattr *res, + u16 *bucket_index, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[ARRAY_SIZE(rtm_nh_res_bucket_policy_get)]; + int err; + + err = nla_parse_nested(tb, ARRAY_SIZE(rtm_nh_res_bucket_policy_get) - 1, + res, rtm_nh_res_bucket_policy_get, extack); + if (err < 0) + return err; + + if (!tb[NHA_RES_BUCKET_INDEX]) { + NL_SET_ERR_MSG(extack, "Bucket index is missing"); + return -EINVAL; + } + + *bucket_index = nla_get_u16(tb[NHA_RES_BUCKET_INDEX]); + return 0; +} + +static int nh_valid_get_bucket_req(const struct nlmsghdr *nlh, + u32 *id, u16 *bucket_index, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[ARRAY_SIZE(rtm_nh_policy_get_bucket)]; + int err; + + err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb, + ARRAY_SIZE(rtm_nh_policy_get_bucket) - 1, + rtm_nh_policy_get_bucket, extack); + if (err < 0) + return err; + + err = __nh_valid_get_del_req(nlh, tb, id, extack); + if (err) + return err; + + if (!tb[NHA_RES_BUCKET]) { + NL_SET_ERR_MSG(extack, "Bucket information is missing"); + return -EINVAL; + } + + err = nh_valid_get_bucket_req_res_bucket(tb[NHA_RES_BUCKET], + bucket_index, extack); + if (err) + return err; + + return 0; +} + +/* rtnl */ +static int rtm_get_nexthop_bucket(struct sk_buff *in_skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct net *net = sock_net(in_skb->sk); + struct nh_res_table *res_table; + struct sk_buff *skb = NULL; + struct nh_group *nhg; + struct nexthop *nh; + u16 bucket_index; + int err; + u32 id; + + err = nh_valid_get_bucket_req(nlh, &id, &bucket_index, extack); + if (err) + return err; + + nh = nexthop_find_group_resilient(net, id, extack); + if (IS_ERR(nh)) + return PTR_ERR(nh); + + nhg = rtnl_dereference(nh->nh_grp); + res_table = rtnl_dereference(nhg->res_table); + if (bucket_index >= res_table->num_nh_buckets) { + NL_SET_ERR_MSG(extack, "Bucket index out of bounds"); + return -ENOENT; + } + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb) + return -ENOBUFS; + + err = nh_fill_res_bucket(skb, nh, &res_table->nh_buckets[bucket_index], + bucket_index, RTM_NEWNEXTHOPBUCKET, + NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, + 0, extack); + if (err < 0) { + WARN_ON(err == -EMSGSIZE); + goto errout_free; + } + + return rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid); + +errout_free: + kfree_skb(skb); + return err; +} + static void nexthop_sync_mtu(struct net_device *dev, u32 orig_mtu) { unsigned int hash = nh_dev_hashfn(dev->ifindex); @@ -3604,7 +3712,7 @@ static int __init nexthop_init(void) rtnl_register(PF_INET6, RTM_NEWNEXTHOP, rtm_new_nexthop, NULL, 0); rtnl_register(PF_INET6, RTM_GETNEXTHOP, NULL, rtm_dump_nexthop, 0); - rtnl_register(PF_UNSPEC, RTM_GETNEXTHOPBUCKET, NULL, + rtnl_register(PF_UNSPEC, RTM_GETNEXTHOPBUCKET, rtm_get_nexthop_bucket, rtm_dump_nexthop_bucket, 0); return 0; From patchwork Thu Mar 11 18:03:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 398177 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 004CEC433DB for ; Thu, 11 Mar 2021 18:05:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 71D4464FEB for ; Thu, 11 Mar 2021 18:05:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230388AbhCKSFN (ORCPT ); Thu, 11 Mar 2021 13:05:13 -0500 Received: from mail-dm6nam08on2049.outbound.protection.outlook.com ([40.107.102.49]:43360 "EHLO NAM04-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230085AbhCKSEg (ORCPT ); Thu, 11 Mar 2021 13:04:36 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YxG/Xy5IDonbL849Dih+/sFkjZ3wj/9G50CBhOdWSQnDc04/BvUzxASFkwfvl2zmUblyx+PsWHoMDZz4v/eMPHHbhRPEUdlzzoUr6a0eMnk/uk28UrLNiLz4ICiOhhiTS7JhDcMVCSC70QYNd41ZI2ToGvTIU9Rwx+kgRBUItmzyNIiKFBgDeo2MORJJjSCu2yIvUYcI4jIwc+b5ZxAeppOsQiXRc9Fp+cI1FeKIw+KRK/coX5LmGZoY9kH0CE1j5seJ6y/QbiYAeuLOz2lxejIyq6pPZVu42kjHISByMLonwyjABCdz/0K3rJehEJZxFIQoCe1ZB8f1ap+SnHIGLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aQwgibyOxo4QBN94odOyUEiYE4EsD2kKj2zLROEQRm4=; b=Kt9Eupc9ygvI148H7iCpQ8XodLyEmPU7F+Ca6r8zI+rd7t7WBqNx7rzbGNpkm64NJ6yXi/3tnmdbstvYU/81gnnyFJ3pmF8X2mbJdE1R2M5cIamL7u+vLVEY18L+twIgn+OQMsxTOHs7iy42qJPqvC1pUKnbkue+YUhP6Z51WUXiDPmz8N40AMbqJRM81NK08Jcb1U5JHUYooCyLVi23E4lbFTn4CM2ErEev/KViqcKdcEFF6A3exY3YSM5ZV+F39j4s+djugXP6FjiAgcOHcx7D4Z0n4oVwvQvkYi1uR6j6ACdlf213xUChRgr3WGLYihWw89L77ZcMCo3dYrdyvQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aQwgibyOxo4QBN94odOyUEiYE4EsD2kKj2zLROEQRm4=; b=LkZlw62ihiZsG8lArRQc3TgpWYizdQL4lv2gjB/E04TGMTxFtUZnjpBSShNQ9R/J41YBnLj+WrUceltRhbG/ARnPszbk3VDCOCSdONU1O9dQXdcoqZUXQDXe5iouIQDlYvoQgiF1HJ4dC7aamqRi7Y6qnmLZG39u3I6RlJYZLMYNh+1Ek2EHDy3YOW9AC+10v5zLxG4QwSMlvY/A7pM4pejUSx6TtLOp0G1ydCIPFTDdKxEdyHUvttyzOsxFlG2sH9t3ExHV3KPUW47Q+tFdq52Fbo7ujUrrsQfoZw0bqBvU6BnZF8kTMPwnO1EvhlWbYq/JS5v/8XKAaLmhk32d3g== Received: from BN6PR2001CA0022.namprd20.prod.outlook.com (2603:10b6:404:b4::32) by MWHPR12MB1183.namprd12.prod.outlook.com (2603:10b6:300:d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.30; Thu, 11 Mar 2021 18:04:34 +0000 Received: from BN8NAM11FT032.eop-nam11.prod.protection.outlook.com (2603:10b6:404:b4:cafe::5c) by BN6PR2001CA0022.outlook.office365.com (2603:10b6:404:b4::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17 via Frontend Transport; Thu, 11 Mar 2021 18:04:33 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT032.mail.protection.outlook.com (10.13.177.88) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:33 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:30 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 13/14] nexthop: Notify userspace about bucket migrations Date: Thu, 11 Mar 2021 19:03:24 +0100 Message-ID: <42f92f35d09ee78deea155ea24c7e4cc9e48a677.1615485052.git.petrm@nvidia.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 558f361f-07c8-46d9-256a-08d8e4b81fef X-MS-TrafficTypeDiagnostic: MWHPR12MB1183: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:6790; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: SvAD0XtRXOElip0TUpDpoFeb77PkLERbszI4MkCd7Y8umlTuiOYysThI99P8m0gJA/wqq31XI8fo9ZGfwDxt/ZLlqV2d6nFJxGBzr8ol6gq9Hmpbg1crns5xRDfxFWoPwu56U+LU3YdglWnE5uKNDdTZIcFaysoVYJPaTJyOiBHsLn3zyjVPGybkY1vSXESPSlAjcflAeOpkxGHHMhU19LV+jdHnJEEW9j3TuufWAfuzt49yPx8LUytAHpjsJj9HAWv17UYSCfTyGyF6caseq/cmKtvZ6S3Q34gkxR3TE8JTF+cqmEO94bYnxLM21D4Cyb6c54eFlk1zfGDZmPLaR7oZ1WPLVXr1ExcjC+uMwE7OQ9umDZHEzTHCA0ffYejobkt9FgjgMRjp92OHTzt2XUWhN6W6vQKa5aFlIxo+JFX6myOiW5dvX70veMOKD9v0dxcGVM73E3JFU8kydSHSe1o8wOSzz4Okdl5g/wZOUBShqtKc64VsGLXzs8oSQAwqydUNVgN5ACYCOeoQXU77DEFl7W4CnSXHZIiBtknx9xZA5HCU6+/+7K1evGhas24rnbbB+nxhDnbkRFtTqhb40sOP7/XswQlr9614BmEdxPDn1JutgGBRUvK5j4F7lgssOT+eTAd9tWAicorUX9WPG3ZB3a0TkEimGGQDPCKo3YdTfJ+gqTHUY2RmrDKbHhI5 X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(39860400002)(136003)(346002)(376002)(396003)(46966006)(36840700001)(107886003)(316002)(186003)(336012)(86362001)(54906003)(16526019)(70586007)(34020700004)(26005)(356005)(8676002)(70206006)(82310400003)(6916009)(7636003)(6666004)(83380400001)(426003)(478600001)(2616005)(36756003)(4326008)(8936002)(2906002)(47076005)(5660300002)(36860700001)(36906005)(82740400003); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:33.2227 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 558f361f-07c8-46d9-256a-08d8e4b81fef X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT032.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR12MB1183 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Nexthop replacements et.al. are notified through netlink, but if a delayed work migrates buckets on the background, userspace will stay oblivious. Notify these as RTM_NEWNEXTHOPBUCKET events. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: David Ahern --- Notes: v1 (changes since RFC): - u32 -> u16 for bucket counts / indices net/ipv4/nexthop.c | 45 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 39 insertions(+), 6 deletions(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 3d602ef6f2c1..015a47e8163a 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -957,6 +957,34 @@ static int nh_fill_res_bucket(struct sk_buff *skb, struct nexthop *nh, return -EMSGSIZE; } +static void nexthop_bucket_notify(struct nh_res_table *res_table, + u16 bucket_index) +{ + struct nh_res_bucket *bucket = &res_table->nh_buckets[bucket_index]; + struct nh_grp_entry *nhge = nh_res_dereference(bucket->nh_entry); + struct nexthop *nh = nhge->nh_parent; + struct sk_buff *skb; + int err = -ENOBUFS; + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb) + goto errout; + + err = nh_fill_res_bucket(skb, nh, bucket, bucket_index, + RTM_NEWNEXTHOPBUCKET, 0, 0, NLM_F_REPLACE, + NULL); + if (err < 0) { + kfree_skb(skb); + goto errout; + } + + rtnl_notify(skb, nh->net, 0, RTNLGRP_NEXTHOP, NULL, GFP_KERNEL); + return; +errout: + if (err < 0) + rtnl_set_sk_err(nh->net, RTNLGRP_NEXTHOP, err); +} + static bool valid_group_nh(struct nexthop *nh, unsigned int npaths, bool *is_fdb, struct netlink_ext_ack *extack) { @@ -1470,7 +1498,8 @@ static bool nh_res_bucket_should_migrate(struct nh_res_table *res_table, } static bool nh_res_bucket_migrate(struct nh_res_table *res_table, - u16 bucket_index, bool notify, bool force) + u16 bucket_index, bool notify, + bool notify_nl, bool force) { struct nh_res_bucket *bucket = &res_table->nh_buckets[bucket_index]; struct nh_grp_entry *new_nhge; @@ -1513,6 +1542,9 @@ static bool nh_res_bucket_migrate(struct nh_res_table *res_table, nh_res_bucket_set_nh(bucket, new_nhge); nh_res_bucket_set_idle(res_table, bucket); + if (notify_nl) + nexthop_bucket_notify(res_table, bucket_index); + if (nh_res_nhge_is_balanced(new_nhge)) list_del(&new_nhge->res.uw_nh_entry); return true; @@ -1520,7 +1552,8 @@ static bool nh_res_bucket_migrate(struct nh_res_table *res_table, #define NH_RES_UPKEEP_DW_MINIMUM_INTERVAL (HZ / 2) -static void nh_res_table_upkeep(struct nh_res_table *res_table, bool notify) +static void nh_res_table_upkeep(struct nh_res_table *res_table, + bool notify, bool notify_nl) { unsigned long now = jiffies; unsigned long deadline; @@ -1545,7 +1578,7 @@ static void nh_res_table_upkeep(struct nh_res_table *res_table, bool notify) if (nh_res_bucket_should_migrate(res_table, bucket, &deadline, &force)) { if (!nh_res_bucket_migrate(res_table, i, notify, - force)) { + notify_nl, force)) { unsigned long idle_point; /* A driver can override the migration @@ -1586,7 +1619,7 @@ static void nh_res_table_upkeep_dw(struct work_struct *work) struct nh_res_table *res_table; res_table = container_of(dw, struct nh_res_table, upkeep_dw); - nh_res_table_upkeep(res_table, true); + nh_res_table_upkeep(res_table, true, true); } static void nh_res_table_cancel_upkeep(struct nh_res_table *res_table) @@ -1674,7 +1707,7 @@ static void replace_nexthop_grp_res(struct nh_group *oldg, nh_res_group_rebalance(newg, old_res_table); if (prev_has_uw && !list_empty(&old_res_table->uw_nh_entries)) old_res_table->unbalanced_since = prev_unbalanced_since; - nh_res_table_upkeep(old_res_table, true); + nh_res_table_upkeep(old_res_table, true, false); } static void nh_mp_group_rebalance(struct nh_group *nhg) @@ -2288,7 +2321,7 @@ static int insert_nexthop(struct net *net, struct nexthop *new_nh, /* Do not send bucket notifications, we do full * notification below. */ - nh_res_table_upkeep(res_table, false); + nh_res_table_upkeep(res_table, false, false); } } From patchwork Thu Mar 11 18:03:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Machata X-Patchwork-Id: 399475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9566AC433E0 for ; Thu, 11 Mar 2021 18:05:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 470B764FD4 for ; Thu, 11 Mar 2021 18:05:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230039AbhCKSFJ (ORCPT ); Thu, 11 Mar 2021 13:05:09 -0500 Received: from mail-mw2nam12on2086.outbound.protection.outlook.com ([40.107.244.86]:12768 "EHLO NAM12-MW2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230118AbhCKSEi (ORCPT ); Thu, 11 Mar 2021 13:04:38 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kJcviD8IbrgarE+aA3aicg+f2nY0nqoBUgT7zHxDFm856pWk0Cce4fslq63SjXOP/ga71eX3OMGjaQatFclFt4qZHG/Sav0Hzw0qsMC9pX631r6AfTN8t0x58lHTW4pisQCAHNx3U3xQF6y/cZnGb6CTHQpNOBucLZdBfoLFVmBe9emGhU2otR4PExMnqrXoxm0JK8/bi5tPlHXyYg6ZdquhxotnxFdhvFNQ7RdjtnPCxBSY5txf/yv9AkVJAC0iaQBCHH5KC1/Tp5u6zYssUPw6iJHI/dzOlmieE6qPloXiXCzLj1ZJGNcGyLrtA4qOak9CxGHjN+FrVc6jNwz2pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2rzEYLvZTdANOTab1JD/2RtYDQinWQ0ucwdxyh5s2yg=; b=Vt6C95i+FRMS03R+T5n+yn8S6uBfMuuO8Rd6C3Gmcx/QS1B5fyD3ujdzwIY1+gppeBse/IdCby9oXe2kgpIPmXuE3S3lR1lrv6t6FbZJOr/PlEj6vAKtJ3eozn6HdDkdz8fqotEN8DI/2DcjA+7Qpz/foFxopbOaRj21G3l5F2cuvgxsj78ELRzVj4eS5SqIh25/7D2rhytckv53p9FugX1t9eirJgo/ylM1MAdoGpdZ9OF6vKpZ+0Zo2P2aI+QMejL79Qs9HiLElkUtQugJR6XHaEamN06yxSr9JuBUn8COrwtHL6sYCJy3Ix1frRny4UieLQ7RVSCmnOn7cCtqSw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2rzEYLvZTdANOTab1JD/2RtYDQinWQ0ucwdxyh5s2yg=; b=n4UdBy/Yn8iylXRhOSyyjtS7w1ahID2ieaNtaqE/g+mNOhxdXHLvED4DYOAHpiXN51nQaTisM8v4kR3G4qLaVIMvM0Twf6AwXBtQR55ZAEh2H1UBy7YM5ThREo2Eg7zQV2fU1xm3JWu05JMVFsPkqhvVu6LuQ3HDkvAsnp1OjTrG0WNXpzehOz/Cmc0iyMrZOjId8CnK2yyRztoSPGokacCNLGgNbOo5kdOL46HdYrOP1oBoIWQZWGwIkNYDeIYX3ZRF5gCDs3S8atAklsC12or2uexHJvJFlxG/ZeG0Gftzb2YJewa2axx91KZtJzsIuNx5fybnFacjch1gyR+eJw== Received: from BN9PR03CA0624.namprd03.prod.outlook.com (2603:10b6:408:106::29) by CH2PR12MB4969.namprd12.prod.outlook.com (2603:10b6:610:68::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17; Thu, 11 Mar 2021 18:04:36 +0000 Received: from BN8NAM11FT058.eop-nam11.prod.protection.outlook.com (2603:10b6:408:106:cafe::62) by BN9PR03CA0624.outlook.office365.com (2603:10b6:408:106::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17 via Frontend Transport; Thu, 11 Mar 2021 18:04:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; davemloft.net; dkim=none (message not signed) header.d=none; davemloft.net; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT058.mail.protection.outlook.com (10.13.177.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.3933.31 via Frontend Transport; Thu, 11 Mar 2021 18:04:36 +0000 Received: from localhost.localdomain (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 11 Mar 2021 18:04:33 +0000 From: Petr Machata To: CC: Ido Schimmel , David Ahern , "David S . Miller" , Jakub Kicinski , "Petr Machata" Subject: [PATCH net-next v2 14/14] nexthop: Enable resilient next-hop groups Date: Thu, 11 Mar 2021 19:03:25 +0100 Message-ID: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 4b4621b3-b0d6-4160-5cc3-08d8e4b821b1 X-MS-TrafficTypeDiagnostic: CH2PR12MB4969: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:3044; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: CSIFplDbNCDzFFeGs0NveQ7UrpLRjYeo4DL0G2AZJgpNFbS9k7TeofzmajoRqs7V0lt8GLdnjfLlcE86E28X4TQrZGsEzYZuSbEsPnNRk3b5gHRKOoVr4IayHtufk2UqnxJUzwui0s8K0+xORWw0c/nxSknYERUstXVwjwAMMu5h63Bk2DJf1bemSd4xoc/ka6W/9tovA2/M/Lp9nSkZ4L64DUx/Cz5cEayNTC5aKHiKr06D1ZFN+zCD8wu8G4nYa3u+GCiqwBuKpcRsUXMeBPNiMybJyEbJhWBui6IRxL2isofL4GaazQG048hpyNhLlcISObZNf8G/5NWkqNHCubzy5Uli0qVk9wbypj45tpzjhoun9SY+NGAiW+4OqA/K+WU2f03auRoRNX8EPelStX6yRKI8QquKukABgTwGp0bpCHcCbnswLN5dQLOWMJpEPUnNSlcNHE81FCRlbP9Kw/OhyO5CU7y7PH+RetS1i8L6zBISV0KUy8hFWRQuz2nfqSyenGME+bBeMmZ7cqCI6Ulc2qbSwsAhbF8TZKK7LBp/ih1gKan8DHb8EFQJy1/MTHcEMUyVYMotoiADRaB3/I12k9uDgIyR3uWtpV9+B76cvPiurDbQf8spKNduXFHmJ6J5f7jCKClZg+olYLSgSSX/M8EqfMazs1kkjqOcWkIkOI4Gvu5kBj3XNh2A/IGF X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(376002)(39860400002)(396003)(346002)(136003)(46966006)(36840700001)(4326008)(107886003)(2616005)(70586007)(2906002)(54906003)(86362001)(83380400001)(426003)(6666004)(26005)(336012)(47076005)(5660300002)(6916009)(316002)(8936002)(4744005)(16526019)(34020700004)(82310400003)(36906005)(186003)(36860700001)(36756003)(7636003)(82740400003)(356005)(478600001)(8676002)(70206006); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Mar 2021 18:04:36.0823 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4b4621b3-b0d6-4160-5cc3-08d8e4b821b1 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT058.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4969 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Now that all the code is in place, stop rejecting requests to create resilient next-hop groups. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: David Ahern --- net/ipv4/nexthop.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 015a47e8163a..f09fe3a5608f 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -2443,10 +2443,6 @@ static struct nexthop *nexthop_create_group(struct net *net, } else if (cfg->nh_grp_type == NEXTHOP_GRP_TYPE_RES) { struct nh_res_table *res_table; - /* Bounce resilient groups for now. */ - err = -EINVAL; - goto out_no_nh; - res_table = nexthop_res_table_alloc(net, cfg->nh_id, cfg); if (!res_table) { err = -ENOMEM;