From patchwork Wed Jun 15 13:48:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Mikityanskiy X-Patchwork-Id: 582067 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4FC4C433EF for ; Wed, 15 Jun 2022 13:49:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347477AbiFONtV (ORCPT ); Wed, 15 Jun 2022 09:49:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238566AbiFONtU (ORCPT ); Wed, 15 Jun 2022 09:49:20 -0400 Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1anam02on2046.outbound.protection.outlook.com [40.107.96.46]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD7393A72B; Wed, 15 Jun 2022 06:49:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IB0tuN32IRyP93HuZgTYDv256RSASmtlEb9aC8DoUaEuAg5SINE895Z4eKCHwnE6NXR9JHRN+G1PCf1kHZgt0NqQIiv6GpoLkHDLbelMqBv+ZvwEu/fGL96dq4xOyf4F/FZ5NsdDllcGbqF5Fg4EEi07L8N+DrOHawrXr+LMCYc2dOzPls3VjVs3i6y18muCO0u/j0KhEi6t22YxY9vWfzEYK0wIhb4ZhohoRJNk9masHiqP1QCgzqTz9GV6vJaetmdXbfmXTIl45yfN3iZKbQuCc5zoBNtGq3SWm5pgGhLtXAquCigiP9B5u8MIfLk6X+tZmRnUBFJWZXTgQ2gZSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4+JqQQwP6+mgWCrSsL6IJu1IlMH1Z92Lc11ORkYhHhY=; b=mdH+dPPzIngtUtXwAczawS/TyUMXnCzKDvGjRr2j4sWyqfdyicKQrRR4UMD28L0JMJYfHkY1qvGcTM0DZGbP1QtoYimcQMBL/uWMwiSm2GDyA9qMabD5HQehGGZa0g9fWrgej1IIgN6vY/TFvf527bbi0MGf1/oVlxBzWoc2/KYLVrwVku4bZysaHyy3UnbfDNGJK0hDh4qb/6NOf8s+JQG2mL525ZEUBE5VKbKho6u6Et7wbXOhQHh7UdHy9ckO/fbYFHKR1h5vomkZmPSt43QXW5DCitGu1dRwAZ6hKYF+04wbxJLFNbjfeS9YRWB9OSsc76Do2M1ibDgGnNelAw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=gmail.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4+JqQQwP6+mgWCrSsL6IJu1IlMH1Z92Lc11ORkYhHhY=; b=U/6i3l6+Pn+RhLqAD4hgLIXkZ+Dsu36t3MDlgFBD/tPmQZHstN9XglwRkYgVVuE55QIXdjpdNP/9YKauv3t8MBx0XYxeo/1JnE+S3Xn9KTPvojUksb40hYZb3QAEXLCvK2+bfxKzQt6YdXTdo0c1j0MaLnEukUZ8IPkPfPS7dlggvrU6R6CE7fD1bciNMmCGVG88ucVSYscEsfYzqXGnAAo3OQnrS70yyH8PWg8s3gVX9MEfVDWFglin1SGtXM4VZVo6PmGZ8TWxShqt2tEbDwt9IkO0uY5t1herU2lF9lMTT7cmPxBlh/DYiuqIPkogRtdc9IJbqkRQiNWqZpKCBA== Received: from MW4P223CA0023.NAMP223.PROD.OUTLOOK.COM (2603:10b6:303:80::28) by BN7PR12MB2724.namprd12.prod.outlook.com (2603:10b6:408:31::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.14; Wed, 15 Jun 2022 13:49:16 +0000 Received: from CO1NAM11FT020.eop-nam11.prod.protection.outlook.com (2603:10b6:303:80:cafe::9f) by MW4P223CA0023.outlook.office365.com (2603:10b6:303:80::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.13 via Frontend Transport; Wed, 15 Jun 2022 13:49:16 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.236) by CO1NAM11FT020.mail.protection.outlook.com (10.13.174.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5332.12 via Frontend Transport; Wed, 15 Jun 2022 13:49:14 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Wed, 15 Jun 2022 13:49:13 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 15 Jun 2022 06:49:12 -0700 Received: from vdi.nvidia.com (10.127.8.12) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Wed, 15 Jun 2022 06:49:06 -0700 From: Maxim Mikityanskiy To: , Alexei Starovoitov , "Daniel Borkmann" , Andrii Nakryiko , CC: Tariq Toukan , Martin KaFai Lau , "Song Liu" , Yonghong Song , John Fastabend , KP Singh , "David S. Miller" , Jakub Kicinski , Eric Dumazet , Hideaki YOSHIFUJI , "David Ahern" , Shuah Khan , "Jesper Dangaard Brouer" , Nathan Chancellor , "Nick Desaulniers" , Joe Stringer , "Florent Revest" , , =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , "Kumar Kartikeya Dwivedi" , Florian Westphal , , Maxim Mikityanskiy Subject: [PATCH bpf-next v10 2/6] bpf: Allow helpers to accept pointers with a fixed size Date: Wed, 15 Jun 2022 16:48:43 +0300 Message-ID: <20220615134847.3753567-3-maximmi@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220615134847.3753567-1-maximmi@nvidia.com> References: <20220615134847.3753567-1-maximmi@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: af3adda7-df12-4f23-7401-08da4ed5d5d4 X-MS-TrafficTypeDiagnostic: BN7PR12MB2724:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: K3TP9nRkbfNZH+Juisb0rezEPuQDgTzog2YlHpo0b0Q/9Gst0S+RrMmvetSzSnAl7u3B5FgQZBqn7gorjbLl8fy+nvFrKkSWiVdL4u0anhyYNKsQggUmme1EfVh+4u1GTOwTBcT91oJ9TdINs0cjQZwb3IhbSMIDmiocbTvBQMV+pdrJSrRwBH1kX9/dUJOmjHqgq1bH/yQJeLf1RCOS13l/C3+da3eLt2tv8jUwbyX6w7ia8SCSYMdS0JqbOvYCDpvD/n7M/Bq//AQWfzpfF2tAF1fv9tfqcSVQXxFw0SwCvyO3MIsT/b1KUdREafCumd/Gn6Q56+5x/7OSIPDDKLYGEbFjKPYFPQ2ZUKSC6A1ivkMo6hdPtH0LGOs++sSJRVqcdniV88B2D2H8E+ZcqKX2Jh87H9ZUZNI74jKV/+IAPrFcxbu7sBr6EWNmVkj+8rmwMYFwvFDXGU3H7fZjQaZ+hvlIGa/v7yBFHAsoeqzYxeeg+OPmOQcqlkjPxoIHEFhJgLTCvgKSgabiXCSq+pc5RnyWDBMpLej2T+dSg0c8VExwi1H8hvoqXd46c0gpSZKYHcg3oPkeRNPVgmIMjawrr4aEocYS4njQ7f0ir/ciSu5sJwSIMuBY8J6htGYfeGXOOb1HuC/JZzxay27LkJzvFpulMj44F5DkNhjpCiQe1BHnHWBxdlKa+bUGaXyHvkJu1XUh8BV5bog1csvWw/Bh2iVOXWST+h67ipm79US1o/NlN9Xgt3lCEX7H4Tf5 X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(13230016)(4636009)(46966006)(40470700004)(36840700001)(4326008)(8676002)(70586007)(70206006)(2906002)(40460700003)(356005)(86362001)(5660300002)(7416002)(8936002)(81166007)(82310400005)(26005)(186003)(107886003)(47076005)(336012)(508600001)(2616005)(6666004)(7696005)(36860700001)(110136005)(426003)(316002)(1076003)(83380400001)(54906003)(36756003)(461764006)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jun 2022 13:49:14.7255 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: af3adda7-df12-4f23-7401-08da4ed5d5d4 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT020.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN7PR12MB2724 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Before this commit, the BPF verifier required ARG_PTR_TO_MEM arguments to be followed by ARG_CONST_SIZE holding the size of the memory region. The helpers had to check that size in runtime. There are cases where the size expected by a helper is a compile-time constant. Checking it in runtime is an unnecessary overhead and waste of BPF registers. This commit allows helpers to accept pointers to memory without the corresponding ARG_CONST_SIZE, given that they define the memory region size in struct bpf_func_proto and use ARG_PTR_TO_FIXED_SIZE_MEM type. arg_size is unionized with arg_btf_id to reduce the kernel image size, and it's valid because they are used by different argument types. Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan --- include/linux/bpf.h | 13 +++++++++++++ kernel/bpf/verifier.c | 43 ++++++++++++++++++++++++++++++++----------- 2 files changed, 45 insertions(+), 11 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 8e6092d0ea95..985f9db8f678 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -398,6 +398,9 @@ enum bpf_type_flag { /* DYNPTR points to a ringbuf record. */ DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS), + /* Size is known at compile time. */ + MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; @@ -461,6 +464,8 @@ enum bpf_arg_type { * all bytes or clear them in error case. */ ARG_PTR_TO_UNINIT_MEM = MEM_UNINIT | ARG_PTR_TO_MEM, + /* Pointer to valid memory of size known at compile time. */ + ARG_PTR_TO_FIXED_SIZE_MEM = MEM_FIXED_SIZE | ARG_PTR_TO_MEM, /* This must be the last entry. Its purpose is to ensure the enum is * wide enough to hold the higher bits reserved for bpf_type_flag. @@ -526,6 +531,14 @@ struct bpf_func_proto { u32 *arg5_btf_id; }; u32 *arg_btf_id[5]; + struct { + size_t arg1_size; + size_t arg2_size; + size_t arg3_size; + size_t arg4_size; + size_t arg5_size; + }; + size_t arg_size[5]; }; int *ret_btf_id; /* return value btf_id */ bool (*allowed)(const struct bpf_prog *prog); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2d2872682278..8c929d899665 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5848,6 +5848,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; enum bpf_arg_type arg_type = fn->arg_type[arg]; enum bpf_reg_type type = reg->type; + u32 *arg_btf_id = NULL; int err = 0; if (arg_type == ARG_DONTCARE) @@ -5884,7 +5885,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, */ goto skip_type_check; - err = check_reg_type(env, regno, arg_type, fn->arg_btf_id[arg], meta); + /* arg_btf_id and arg_size are in a union. */ + if (base_type(arg_type) == ARG_PTR_TO_BTF_ID) + arg_btf_id = fn->arg_btf_id[arg]; + + err = check_reg_type(env, regno, arg_type, arg_btf_id, meta); if (err) return err; @@ -6011,6 +6016,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, * next is_mem_size argument below. */ meta->raw_mode = arg_type & MEM_UNINIT; + if (arg_type & MEM_FIXED_SIZE) { + err = check_helper_mem_access(env, regno, + fn->arg_size[arg], false, + meta); + } } else if (arg_type_is_mem_size(arg_type)) { bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO); @@ -6400,11 +6410,19 @@ static bool check_raw_mode_ok(const struct bpf_func_proto *fn) return count <= 1; } -static bool check_args_pair_invalid(enum bpf_arg_type arg_curr, - enum bpf_arg_type arg_next) +static bool check_args_pair_invalid(const struct bpf_func_proto *fn, int arg) { - return (base_type(arg_curr) == ARG_PTR_TO_MEM) != - arg_type_is_mem_size(arg_next); + bool is_fixed = fn->arg_type[arg] & MEM_FIXED_SIZE; + bool has_size = fn->arg_size[arg] != 0; + bool is_next_size = false; + + if (arg + 1 < ARRAY_SIZE(fn->arg_type)) + is_next_size = arg_type_is_mem_size(fn->arg_type[arg + 1]); + + if (base_type(fn->arg_type[arg]) != ARG_PTR_TO_MEM) + return is_next_size; + + return has_size == is_next_size || is_next_size == is_fixed; } static bool check_arg_pair_ok(const struct bpf_func_proto *fn) @@ -6415,11 +6433,11 @@ static bool check_arg_pair_ok(const struct bpf_func_proto *fn) * helper function specification. */ if (arg_type_is_mem_size(fn->arg1_type) || - base_type(fn->arg5_type) == ARG_PTR_TO_MEM || - check_args_pair_invalid(fn->arg1_type, fn->arg2_type) || - check_args_pair_invalid(fn->arg2_type, fn->arg3_type) || - check_args_pair_invalid(fn->arg3_type, fn->arg4_type) || - check_args_pair_invalid(fn->arg4_type, fn->arg5_type)) + check_args_pair_invalid(fn, 0) || + check_args_pair_invalid(fn, 1) || + check_args_pair_invalid(fn, 2) || + check_args_pair_invalid(fn, 3) || + check_args_pair_invalid(fn, 4)) return false; return true; @@ -6460,7 +6478,10 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn) if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i]) return false; - if (base_type(fn->arg_type[i]) != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i]) + if (base_type(fn->arg_type[i]) != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i] && + /* arg_btf_id and arg_size are in a union. */ + (base_type(fn->arg_type[i]) != ARG_PTR_TO_MEM || + !(fn->arg_type[i] & MEM_FIXED_SIZE))) return false; } From patchwork Wed Jun 15 13:48:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Mikityanskiy X-Patchwork-Id: 582066 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59852C433EF for ; Wed, 15 Jun 2022 13:49:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352323AbiFONtg (ORCPT ); Wed, 15 Jun 2022 09:49:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353330AbiFONtd (ORCPT ); Wed, 15 Jun 2022 09:49:33 -0400 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2075.outbound.protection.outlook.com [40.107.220.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 465553CA44; Wed, 15 Jun 2022 06:49:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EOQjjFen52igDj43YvJ9l5ICVXG6UbE5Hg26QXem3jpiq7NiehOybyvMWLOth0vLupBCeEfY8GimUCu1/LGW1dBl/og9mqemMT6qZ2wBDVc+hc7QNd9V12g4+R6IVjv9ReQktQkbK3f2yJ1QSzif+/WAF6Bms4+DUzP7NgaEVmNncNfVuHdkEbVhUvxpTgkAR+zfQaUzuOgkQImDoB6T0WVec30hr6iLkmgqd9YGiMUV9bfCZiUQBjyqAsGLS9Z12I/FLBTOpA3GxIDrSPRnkXl2GnrQVuzC05nWSwPllusZUYR7whEuCWjaZSAGZg3hppL4+r0zI7v3ZHU5KjlsSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pb7L4coXh++Y9bhSUu6S04PbC+go3NRzmbNON5UwOVA=; b=A7m7lqsaarz/T7MJmj6A5lryk4Xk2tEp/+VDE/bml2ZgiEeaq4OUXs0XSOZBj51B9ap35JftS9Scib1pV5fujfn5Reg6sMpudugMn8AXcNoZdn6499FNHyGctKw28yntGby6Jnw56YQ7gV/gjt37Fjp3OAcb+CfJMBkBB8eL70qM/sWe9xI3JT0OK/MsnkIRYJPefdN8TYUBg/KYgQnrb4jVfw4NXXeJRMOraaDGyd5RV2vphH0wy8Pjop9ggNO2714ijwNQhIbemFYNtm0ZQFmU41X5cB2ANDsy4/Xwopq3NwcKVL93RbeTNCMkxE1gQRB4BMAkLjVGBTtjLFw0ig== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=gmail.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pb7L4coXh++Y9bhSUu6S04PbC+go3NRzmbNON5UwOVA=; b=AFohdUX1BmFR99vSHSNzJloIx+xdX3DO4IymGT5k5qpSxcMDuqpOPmmnX7cx6PPeVHyXWqZkW21/tS3S1lW2wanUzk5rv+S41VgKqTPjQIudg9pqkJKsMKj1FLdQCl/4SA2SqQIDRlnD7MonKcKTmE5bsACJuszTDerciOJFkGuiC/aXYRXJq0x+1LF9egfJYQZIXL5wZdUItsqky4CoYSw/CHlmieZS+KrFCfq7HS1qmS/TmyaBmgBZXNeiTFNCWKQva1FGu9oRWxjwEWpSDl7DkU+hxY9/IdA/tRh+ZXw4Hpsb1jVG/MkDhc3Ul06BepJLTItLE4VDvaxVEcLKLQ== Received: from MW4PR04CA0284.namprd04.prod.outlook.com (2603:10b6:303:89::19) by BYAPR12MB2917.namprd12.prod.outlook.com (2603:10b6:a03:130::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.16; Wed, 15 Jun 2022 13:49:27 +0000 Received: from CO1NAM11FT046.eop-nam11.prod.protection.outlook.com (2603:10b6:303:89:cafe::7d) by MW4PR04CA0284.outlook.office365.com (2603:10b6:303:89::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.15 via Frontend Transport; Wed, 15 Jun 2022 13:49:27 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.236) by CO1NAM11FT046.mail.protection.outlook.com (10.13.174.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5332.12 via Frontend Transport; Wed, 15 Jun 2022 13:49:27 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Wed, 15 Jun 2022 13:49:27 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 15 Jun 2022 06:49:26 -0700 Received: from vdi.nvidia.com (10.127.8.12) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Wed, 15 Jun 2022 06:49:20 -0700 From: Maxim Mikityanskiy To: , Alexei Starovoitov , "Daniel Borkmann" , Andrii Nakryiko , CC: Tariq Toukan , Martin KaFai Lau , "Song Liu" , Yonghong Song , John Fastabend , KP Singh , "David S. Miller" , Jakub Kicinski , Eric Dumazet , Hideaki YOSHIFUJI , "David Ahern" , Shuah Khan , "Jesper Dangaard Brouer" , Nathan Chancellor , "Nick Desaulniers" , Joe Stringer , "Florent Revest" , , =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , "Kumar Kartikeya Dwivedi" , Florian Westphal , , Maxim Mikityanskiy Subject: [PATCH bpf-next v10 4/6] selftests/bpf: Add selftests for raw syncookie helpers Date: Wed, 15 Jun 2022 16:48:45 +0300 Message-ID: <20220615134847.3753567-5-maximmi@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220615134847.3753567-1-maximmi@nvidia.com> References: <20220615134847.3753567-1-maximmi@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 9d21ee24-1537-409b-bfb5-08da4ed5dd83 X-MS-TrafficTypeDiagnostic: BYAPR12MB2917:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: gxw8pVDWsGOf2SZCR3x5JNYpEjGu9+84DDlyLRm0n2xP8AA8gcEpnN7GBdERIoDb7EqYEPJ1+O9suc68vVKAM+6RORh/gHLzzihiYMEMGq9HKgca2vzbNZn6xHkY67dR4EzDh9LNeDGBw3Jg07fTCVU7p4S4ufObxAsd4zuYwY3Y82N6Rzp3q0XaoDT+3x2iMevTvyq/ZBnagG0oexZetmhgq/7TSygKnZVDj2wTOVAq/sEmz8BO55BnmC7G9m/+X/e5I2XPmJQJIAOzV9I1tdkSamPJklV5TDxjhffyNPeAfVinpsbCYhWpks6ORah72VNOcuP9bFM+8fu6T/nVZUaYTjAQt9JKlaTqOSXIOyU+wcmH2CXTHhrD/dJizPI0mnWBeH4Anq4LVNmPWbZDF0lA9Zoa3/aVDMc/tFvOwv6L6cAxbHHyjqPFkuinGWgZJkXFlB2W2gKN2tLZ/MhRbAhjNLqG3OY3IIH15kiSi2Gaa52XxehbW3VBJU9KPKrqr9S71Wxm0ILwci3SeGolnA/mCRzWR8QE9Sm9w4L7KjdWYPNF230ZOCsSdTBDIFFpXkXF2UNepMw28RV8Z+DRe8jrzCzftIpgtCEi9gmVeDHmZ7T4jFHIIZAv509F5MBMhspRXxu/Otbb8s/nTjeaD7fx8H00jbwPTErjJcrBFpza9lUPLeFmI5QwDgIxvib8wehCi6DxIvEsMCZUGazHMN8G6D9xHmslLB280XOuApw= X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(13230016)(4636009)(40470700004)(36840700001)(46966006)(107886003)(82310400005)(36756003)(83380400001)(86362001)(70586007)(2616005)(70206006)(54906003)(110136005)(316002)(4326008)(8676002)(508600001)(8936002)(36860700001)(81166007)(7416002)(7696005)(6666004)(356005)(2906002)(1076003)(426003)(186003)(40460700003)(47076005)(336012)(5660300002)(26005)(30864003)(21314003)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jun 2022 13:49:27.6496 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9d21ee24-1537-409b-bfb5-08da4ed5dd83 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT046.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB2917 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org This commit adds selftests for the new BPF helpers: bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}. xdp_synproxy_kern.c is a BPF program that generates SYN cookies on allowed TCP ports and sends SYNACKs to clients, accelerating synproxy iptables module. xdp_synproxy.c is a userspace control application that allows to configure the following options in runtime: list of allowed ports, MSS, window scale, TTL. A selftest is added to prog_tests that leverages the above programs to test the functionality of the new helpers. Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan --- tools/testing/selftests/bpf/.gitignore | 1 + tools/testing/selftests/bpf/Makefile | 3 +- .../selftests/bpf/prog_tests/xdp_synproxy.c | 146 ++++ .../selftests/bpf/progs/xdp_synproxy_kern.c | 763 ++++++++++++++++++ tools/testing/selftests/bpf/xdp_synproxy.c | 418 ++++++++++ 5 files changed, 1330 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c create mode 100644 tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c create mode 100644 tools/testing/selftests/bpf/xdp_synproxy.c diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore index 595565eb68c0..ca2f47f45670 100644 --- a/tools/testing/selftests/bpf/.gitignore +++ b/tools/testing/selftests/bpf/.gitignore @@ -43,3 +43,4 @@ test_cpp *.tmp xdpxceiver xdp_redirect_multi +xdp_synproxy diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 8ad7a733a505..e4ce21f4a474 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -82,7 +82,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \ - xdpxceiver xdp_redirect_multi + xdpxceiver xdp_redirect_multi xdp_synproxy TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read @@ -502,6 +502,7 @@ TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c \ cap_helpers.c TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \ $(OUTPUT)/liburandom_read.so \ + $(OUTPUT)/xdp_synproxy \ ima_setup.sh \ $(wildcard progs/btf_dump_test_case_*.c) TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c b/tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c new file mode 100644 index 000000000000..d9ee884c2a2b --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ + +#include +#include +#include + +#define CMD_OUT_BUF_SIZE 1023 + +#define SYS(cmd) ({ \ + if (!ASSERT_OK(system(cmd), (cmd))) \ + goto out; \ +}) + +#define SYS_OUT(cmd) ({ \ + FILE *f = popen((cmd), "r"); \ + if (!ASSERT_OK_PTR(f, (cmd))) \ + goto out; \ + f; \ +}) + +/* out must be at least `size * 4 + 1` bytes long */ +static void escape_str(char *out, const char *in, size_t size) +{ + static const char *hex = "0123456789ABCDEF"; + size_t i; + + for (i = 0; i < size; i++) { + if (isprint(in[i]) && in[i] != '\\' && in[i] != '\'') { + *out++ = in[i]; + } else { + *out++ = '\\'; + *out++ = 'x'; + *out++ = hex[(in[i] >> 4) & 0xf]; + *out++ = hex[in[i] & 0xf]; + } + } + *out++ = '\0'; +} + +static bool expect_str(char *buf, size_t size, const char *str, const char *name) +{ + static char escbuf_expected[CMD_OUT_BUF_SIZE * 4]; + static char escbuf_actual[CMD_OUT_BUF_SIZE * 4]; + static int duration = 0; + bool ok; + + ok = size == strlen(str) && !memcmp(buf, str, size); + + if (!ok) { + escape_str(escbuf_expected, str, strlen(str)); + escape_str(escbuf_actual, buf, size); + } + CHECK(!ok, name, "unexpected %s: actual '%s' != expected '%s'\n", + name, escbuf_actual, escbuf_expected); + + return ok; +} + +void test_xdp_synproxy(void) +{ + int server_fd = -1, client_fd = -1, accept_fd = -1; + struct nstoken *ns = NULL; + FILE *ctrl_file = NULL; + char buf[CMD_OUT_BUF_SIZE]; + size_t size; + + SYS("ip netns add synproxy"); + + SYS("ip link add tmp0 type veth peer name tmp1"); + SYS("ip link set tmp1 netns synproxy"); + SYS("ip link set tmp0 up"); + SYS("ip addr replace 198.18.0.1/24 dev tmp0"); + + /* When checksum offload is enabled, the XDP program sees wrong + * checksums and drops packets. + */ + SYS("ethtool -K tmp0 tx off"); + /* Workaround required for veth. */ + SYS("ip link set tmp0 xdp object xdp_dummy.o section xdp 2> /dev/null"); + + ns = open_netns("synproxy"); + if (!ASSERT_OK_PTR(ns, "setns")) + goto out; + + SYS("ip link set lo up"); + SYS("ip link set tmp1 up"); + SYS("ip addr replace 198.18.0.2/24 dev tmp1"); + SYS("sysctl -w net.ipv4.tcp_syncookies=2"); + SYS("sysctl -w net.ipv4.tcp_timestamps=1"); + SYS("sysctl -w net.netfilter.nf_conntrack_tcp_loose=0"); + SYS("iptables -t raw -I PREROUTING \ + -i tmp1 -p tcp -m tcp --syn --dport 8080 -j CT --notrack"); + SYS("iptables -t filter -A INPUT \ + -i tmp1 -p tcp -m tcp --dport 8080 -m state --state INVALID,UNTRACKED \ + -j SYNPROXY --sack-perm --timestamp --wscale 7 --mss 1460"); + SYS("iptables -t filter -A INPUT \ + -i tmp1 -m state --state INVALID -j DROP"); + + ctrl_file = SYS_OUT("./xdp_synproxy --iface tmp1 --ports 8080 --single \ + --mss4 1460 --mss6 1440 --wscale 7 --ttl 64"); + size = fread(buf, 1, sizeof(buf), ctrl_file); + pclose(ctrl_file); + if (!expect_str(buf, size, "Total SYNACKs generated: 0\n", + "initial SYNACKs")) + goto out; + + server_fd = start_server(AF_INET, SOCK_STREAM, "198.18.0.2", 8080, 0); + if (!ASSERT_GE(server_fd, 0, "start_server")) + goto out; + + close_netns(ns); + ns = NULL; + + client_fd = connect_to_fd(server_fd, 10000); + if (!ASSERT_GE(client_fd, 0, "connect_to_fd")) + goto out; + + accept_fd = accept(server_fd, NULL, NULL); + if (!ASSERT_GE(accept_fd, 0, "accept")) + goto out; + + ns = open_netns("synproxy"); + if (!ASSERT_OK_PTR(ns, "setns")) + goto out; + + ctrl_file = SYS_OUT("./xdp_synproxy --iface tmp1 --single"); + size = fread(buf, 1, sizeof(buf), ctrl_file); + pclose(ctrl_file); + if (!expect_str(buf, size, "Total SYNACKs generated: 1\n", + "SYNACKs after connection")) + goto out; + +out: + if (accept_fd >= 0) + close(accept_fd); + if (client_fd >= 0) + close(client_fd); + if (server_fd >= 0) + close(server_fd); + if (ns) + close_netns(ns); + + system("ip link del tmp0"); + system("ip netns del synproxy"); +} diff --git a/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c b/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c new file mode 100644 index 000000000000..53b9865276a4 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c @@ -0,0 +1,763 @@ +// SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ + +#include "vmlinux.h" + +#include +#include +#include + +#define NSEC_PER_SEC 1000000000L + +#define ETH_ALEN 6 +#define ETH_P_IP 0x0800 +#define ETH_P_IPV6 0x86DD + +#define tcp_flag_word(tp) (((union tcp_word_hdr *)(tp))->words[3]) + +#define IP_DF 0x4000 +#define IP_MF 0x2000 +#define IP_OFFSET 0x1fff + +#define NEXTHDR_TCP 6 + +#define TCPOPT_NOP 1 +#define TCPOPT_EOL 0 +#define TCPOPT_MSS 2 +#define TCPOPT_WINDOW 3 +#define TCPOPT_SACK_PERM 4 +#define TCPOPT_TIMESTAMP 8 + +#define TCPOLEN_MSS 4 +#define TCPOLEN_WINDOW 3 +#define TCPOLEN_SACK_PERM 2 +#define TCPOLEN_TIMESTAMP 10 + +#define TCP_TS_HZ 1000 +#define TS_OPT_WSCALE_MASK 0xf +#define TS_OPT_SACK (1 << 4) +#define TS_OPT_ECN (1 << 5) +#define TSBITS 6 +#define TSMASK (((__u32)1 << TSBITS) - 1) +#define TCP_MAX_WSCALE 14U + +#define IPV4_MAXLEN 60 +#define TCP_MAXLEN 60 + +#define DEFAULT_MSS4 1460 +#define DEFAULT_MSS6 1440 +#define DEFAULT_WSCALE 7 +#define DEFAULT_TTL 64 +#define MAX_ALLOWED_PORTS 8 + +#define swap(a, b) \ + do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0) + +#define __get_unaligned_t(type, ptr) ({ \ + const struct { type x; } __attribute__((__packed__)) *__pptr = (typeof(__pptr))(ptr); \ + __pptr->x; \ +}) + +#define get_unaligned(ptr) __get_unaligned_t(typeof(*(ptr)), (ptr)) + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, __u32); + __type(value, __u64); + __uint(max_entries, 2); +} values SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, __u32); + __type(value, __u16); + __uint(max_entries, MAX_ALLOWED_PORTS); +} allowed_ports SEC(".maps"); + +extern struct nf_conn *bpf_xdp_ct_lookup(struct xdp_md *xdp_ctx, + struct bpf_sock_tuple *bpf_tuple, + __u32 len_tuple, + struct bpf_ct_opts *opts, + __u32 len_opts) __ksym; + +extern void bpf_ct_release(struct nf_conn *ct) __ksym; + +static __always_inline void swap_eth_addr(__u8 *a, __u8 *b) +{ + __u8 tmp[ETH_ALEN]; + + __builtin_memcpy(tmp, a, ETH_ALEN); + __builtin_memcpy(a, b, ETH_ALEN); + __builtin_memcpy(b, tmp, ETH_ALEN); +} + +static __always_inline __u16 csum_fold(__u32 csum) +{ + csum = (csum & 0xffff) + (csum >> 16); + csum = (csum & 0xffff) + (csum >> 16); + return (__u16)~csum; +} + +static __always_inline __u16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, + __u32 len, __u8 proto, + __u32 csum) +{ + __u64 s = csum; + + s += (__u32)saddr; + s += (__u32)daddr; +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ + s += proto + len; +#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + s += (proto + len) << 8; +#else +#error Unknown endian +#endif + s = (s & 0xffffffff) + (s >> 32); + s = (s & 0xffffffff) + (s >> 32); + + return csum_fold((__u32)s); +} + +static __always_inline __u16 csum_ipv6_magic(const struct in6_addr *saddr, + const struct in6_addr *daddr, + __u32 len, __u8 proto, __u32 csum) +{ + __u64 sum = csum; + int i; + +#pragma unroll + for (i = 0; i < 4; i++) + sum += (__u32)saddr->in6_u.u6_addr32[i]; + +#pragma unroll + for (i = 0; i < 4; i++) + sum += (__u32)daddr->in6_u.u6_addr32[i]; + + /* Don't combine additions to avoid 32-bit overflow. */ + sum += bpf_htonl(len); + sum += bpf_htonl(proto); + + sum = (sum & 0xffffffff) + (sum >> 32); + sum = (sum & 0xffffffff) + (sum >> 32); + + return csum_fold((__u32)sum); +} + +static __always_inline __u64 tcp_clock_ns(void) +{ + return bpf_ktime_get_ns(); +} + +static __always_inline __u32 tcp_ns_to_ts(__u64 ns) +{ + return ns / (NSEC_PER_SEC / TCP_TS_HZ); +} + +static __always_inline __u32 tcp_time_stamp_raw(void) +{ + return tcp_ns_to_ts(tcp_clock_ns()); +} + +struct tcpopt_context { + __u8 *ptr; + __u8 *end; + void *data_end; + __be32 *tsecr; + __u8 wscale; + bool option_timestamp; + bool option_sack; +}; + +static int tscookie_tcpopt_parse(struct tcpopt_context *ctx) +{ + __u8 opcode, opsize; + + if (ctx->ptr >= ctx->end) + return 1; + if (ctx->ptr >= ctx->data_end) + return 1; + + opcode = ctx->ptr[0]; + + if (opcode == TCPOPT_EOL) + return 1; + if (opcode == TCPOPT_NOP) { + ++ctx->ptr; + return 0; + } + + if (ctx->ptr + 1 >= ctx->end) + return 1; + if (ctx->ptr + 1 >= ctx->data_end) + return 1; + opsize = ctx->ptr[1]; + if (opsize < 2) + return 1; + + if (ctx->ptr + opsize > ctx->end) + return 1; + + switch (opcode) { + case TCPOPT_WINDOW: + if (opsize == TCPOLEN_WINDOW && ctx->ptr + TCPOLEN_WINDOW <= ctx->data_end) + ctx->wscale = ctx->ptr[2] < TCP_MAX_WSCALE ? ctx->ptr[2] : TCP_MAX_WSCALE; + break; + case TCPOPT_TIMESTAMP: + if (opsize == TCPOLEN_TIMESTAMP && ctx->ptr + TCPOLEN_TIMESTAMP <= ctx->data_end) { + ctx->option_timestamp = true; + /* Client's tsval becomes our tsecr. */ + *ctx->tsecr = get_unaligned((__be32 *)(ctx->ptr + 2)); + } + break; + case TCPOPT_SACK_PERM: + if (opsize == TCPOLEN_SACK_PERM) + ctx->option_sack = true; + break; + } + + ctx->ptr += opsize; + + return 0; +} + +static int tscookie_tcpopt_parse_batch(__u32 index, void *context) +{ + int i; + + for (i = 0; i < 7; i++) + if (tscookie_tcpopt_parse(context)) + return 1; + return 0; +} + +static __always_inline bool tscookie_init(struct tcphdr *tcp_header, + __u16 tcp_len, __be32 *tsval, + __be32 *tsecr, void *data_end) +{ + struct tcpopt_context loop_ctx = { + .ptr = (__u8 *)(tcp_header + 1), + .end = (__u8 *)tcp_header + tcp_len, + .data_end = data_end, + .tsecr = tsecr, + .wscale = TS_OPT_WSCALE_MASK, + .option_timestamp = false, + .option_sack = false, + }; + u32 cookie; + + bpf_loop(6, tscookie_tcpopt_parse_batch, &loop_ctx, 0); + + if (!loop_ctx.option_timestamp) + return false; + + cookie = tcp_time_stamp_raw() & ~TSMASK; + cookie |= loop_ctx.wscale & TS_OPT_WSCALE_MASK; + if (loop_ctx.option_sack) + cookie |= TS_OPT_SACK; + if (tcp_header->ece && tcp_header->cwr) + cookie |= TS_OPT_ECN; + *tsval = bpf_htonl(cookie); + + return true; +} + +static __always_inline void values_get_tcpipopts(__u16 *mss, __u8 *wscale, + __u8 *ttl, bool ipv6) +{ + __u32 key = 0; + __u64 *value; + + value = bpf_map_lookup_elem(&values, &key); + if (value && *value != 0) { + if (ipv6) + *mss = (*value >> 32) & 0xffff; + else + *mss = *value & 0xffff; + *wscale = (*value >> 16) & 0xf; + *ttl = (*value >> 24) & 0xff; + return; + } + + *mss = ipv6 ? DEFAULT_MSS6 : DEFAULT_MSS4; + *wscale = DEFAULT_WSCALE; + *ttl = DEFAULT_TTL; +} + +static __always_inline void values_inc_synacks(void) +{ + __u32 key = 1; + __u32 *value; + + value = bpf_map_lookup_elem(&values, &key); + if (value) + __sync_fetch_and_add(value, 1); +} + +static __always_inline bool check_port_allowed(__u16 port) +{ + __u32 i; + + for (i = 0; i < MAX_ALLOWED_PORTS; i++) { + __u32 key = i; + __u16 *value; + + value = bpf_map_lookup_elem(&allowed_ports, &key); + + if (!value) + break; + /* 0 is a terminator value. Check it first to avoid matching on + * a forbidden port == 0 and returning true. + */ + if (*value == 0) + break; + + if (*value == port) + return true; + } + + return false; +} + +struct header_pointers { + struct ethhdr *eth; + struct iphdr *ipv4; + struct ipv6hdr *ipv6; + struct tcphdr *tcp; + __u16 tcp_len; +}; + +static __always_inline int tcp_dissect(void *data, void *data_end, + struct header_pointers *hdr) +{ + hdr->eth = data; + if (hdr->eth + 1 > data_end) + return XDP_DROP; + + switch (bpf_ntohs(hdr->eth->h_proto)) { + case ETH_P_IP: + hdr->ipv6 = NULL; + + hdr->ipv4 = (void *)hdr->eth + sizeof(*hdr->eth); + if (hdr->ipv4 + 1 > data_end) + return XDP_DROP; + if (hdr->ipv4->ihl * 4 < sizeof(*hdr->ipv4)) + return XDP_DROP; + if (hdr->ipv4->version != 4) + return XDP_DROP; + + if (hdr->ipv4->protocol != IPPROTO_TCP) + return XDP_PASS; + + hdr->tcp = (void *)hdr->ipv4 + hdr->ipv4->ihl * 4; + break; + case ETH_P_IPV6: + hdr->ipv4 = NULL; + + hdr->ipv6 = (void *)hdr->eth + sizeof(*hdr->eth); + if (hdr->ipv6 + 1 > data_end) + return XDP_DROP; + if (hdr->ipv6->version != 6) + return XDP_DROP; + + /* XXX: Extension headers are not supported and could circumvent + * XDP SYN flood protection. + */ + if (hdr->ipv6->nexthdr != NEXTHDR_TCP) + return XDP_PASS; + + hdr->tcp = (void *)hdr->ipv6 + sizeof(*hdr->ipv6); + break; + default: + /* XXX: VLANs will circumvent XDP SYN flood protection. */ + return XDP_PASS; + } + + if (hdr->tcp + 1 > data_end) + return XDP_DROP; + hdr->tcp_len = hdr->tcp->doff * 4; + if (hdr->tcp_len < sizeof(*hdr->tcp)) + return XDP_DROP; + + return XDP_TX; +} + +static __always_inline int tcp_lookup(struct xdp_md *ctx, struct header_pointers *hdr) +{ + struct bpf_ct_opts ct_lookup_opts = { + .netns_id = BPF_F_CURRENT_NETNS, + .l4proto = IPPROTO_TCP, + }; + struct bpf_sock_tuple tup = {}; + struct nf_conn *ct; + __u32 tup_size; + + if (hdr->ipv4) { + /* TCP doesn't normally use fragments, and XDP can't reassemble + * them. + */ + if ((hdr->ipv4->frag_off & bpf_htons(IP_DF | IP_MF | IP_OFFSET)) != bpf_htons(IP_DF)) + return XDP_DROP; + + tup.ipv4.saddr = hdr->ipv4->saddr; + tup.ipv4.daddr = hdr->ipv4->daddr; + tup.ipv4.sport = hdr->tcp->source; + tup.ipv4.dport = hdr->tcp->dest; + tup_size = sizeof(tup.ipv4); + } else if (hdr->ipv6) { + __builtin_memcpy(tup.ipv6.saddr, &hdr->ipv6->saddr, sizeof(tup.ipv6.saddr)); + __builtin_memcpy(tup.ipv6.daddr, &hdr->ipv6->daddr, sizeof(tup.ipv6.daddr)); + tup.ipv6.sport = hdr->tcp->source; + tup.ipv6.dport = hdr->tcp->dest; + tup_size = sizeof(tup.ipv6); + } else { + /* The verifier can't track that either ipv4 or ipv6 is not + * NULL. + */ + return XDP_ABORTED; + } + ct = bpf_xdp_ct_lookup(ctx, &tup, tup_size, &ct_lookup_opts, sizeof(ct_lookup_opts)); + if (ct) { + unsigned long status = ct->status; + + bpf_ct_release(ct); + if (status & IPS_CONFIRMED_BIT) + return XDP_PASS; + } else if (ct_lookup_opts.error != -ENOENT) { + return XDP_ABORTED; + } + + /* error == -ENOENT || !(status & IPS_CONFIRMED_BIT) */ + return XDP_TX; +} + +static __always_inline __u8 tcp_mkoptions(__be32 *buf, __be32 *tsopt, __u16 mss, + __u8 wscale) +{ + __be32 *start = buf; + + *buf++ = bpf_htonl((TCPOPT_MSS << 24) | (TCPOLEN_MSS << 16) | mss); + + if (!tsopt) + return buf - start; + + if (tsopt[0] & bpf_htonl(1 << 4)) + *buf++ = bpf_htonl((TCPOPT_SACK_PERM << 24) | + (TCPOLEN_SACK_PERM << 16) | + (TCPOPT_TIMESTAMP << 8) | + TCPOLEN_TIMESTAMP); + else + *buf++ = bpf_htonl((TCPOPT_NOP << 24) | + (TCPOPT_NOP << 16) | + (TCPOPT_TIMESTAMP << 8) | + TCPOLEN_TIMESTAMP); + *buf++ = tsopt[0]; + *buf++ = tsopt[1]; + + if ((tsopt[0] & bpf_htonl(0xf)) != bpf_htonl(0xf)) + *buf++ = bpf_htonl((TCPOPT_NOP << 24) | + (TCPOPT_WINDOW << 16) | + (TCPOLEN_WINDOW << 8) | + wscale); + + return buf - start; +} + +static __always_inline void tcp_gen_synack(struct tcphdr *tcp_header, + __u32 cookie, __be32 *tsopt, + __u16 mss, __u8 wscale) +{ + void *tcp_options; + + tcp_flag_word(tcp_header) = TCP_FLAG_SYN | TCP_FLAG_ACK; + if (tsopt && (tsopt[0] & bpf_htonl(1 << 5))) + tcp_flag_word(tcp_header) |= TCP_FLAG_ECE; + tcp_header->doff = 5; /* doff is part of tcp_flag_word. */ + swap(tcp_header->source, tcp_header->dest); + tcp_header->ack_seq = bpf_htonl(bpf_ntohl(tcp_header->seq) + 1); + tcp_header->seq = bpf_htonl(cookie); + tcp_header->window = 0; + tcp_header->urg_ptr = 0; + tcp_header->check = 0; /* Calculate checksum later. */ + + tcp_options = (void *)(tcp_header + 1); + tcp_header->doff += tcp_mkoptions(tcp_options, tsopt, mss, wscale); +} + +static __always_inline void tcpv4_gen_synack(struct header_pointers *hdr, + __u32 cookie, __be32 *tsopt) +{ + __u8 wscale; + __u16 mss; + __u8 ttl; + + values_get_tcpipopts(&mss, &wscale, &ttl, false); + + swap_eth_addr(hdr->eth->h_source, hdr->eth->h_dest); + + swap(hdr->ipv4->saddr, hdr->ipv4->daddr); + hdr->ipv4->check = 0; /* Calculate checksum later. */ + hdr->ipv4->tos = 0; + hdr->ipv4->id = 0; + hdr->ipv4->ttl = ttl; + + tcp_gen_synack(hdr->tcp, cookie, tsopt, mss, wscale); + + hdr->tcp_len = hdr->tcp->doff * 4; + hdr->ipv4->tot_len = bpf_htons(sizeof(*hdr->ipv4) + hdr->tcp_len); +} + +static __always_inline void tcpv6_gen_synack(struct header_pointers *hdr, + __u32 cookie, __be32 *tsopt) +{ + __u8 wscale; + __u16 mss; + __u8 ttl; + + values_get_tcpipopts(&mss, &wscale, &ttl, true); + + swap_eth_addr(hdr->eth->h_source, hdr->eth->h_dest); + + swap(hdr->ipv6->saddr, hdr->ipv6->daddr); + *(__be32 *)hdr->ipv6 = bpf_htonl(0x60000000); + hdr->ipv6->hop_limit = ttl; + + tcp_gen_synack(hdr->tcp, cookie, tsopt, mss, wscale); + + hdr->tcp_len = hdr->tcp->doff * 4; + hdr->ipv6->payload_len = bpf_htons(hdr->tcp_len); +} + +static __always_inline int syncookie_handle_syn(struct header_pointers *hdr, + struct xdp_md *ctx, + void *data, void *data_end) +{ + __u32 old_pkt_size, new_pkt_size; + /* Unlike clang 10, clang 11 and 12 generate code that doesn't pass the + * BPF verifier if tsopt is not volatile. Volatile forces it to store + * the pointer value and use it directly, otherwise tcp_mkoptions is + * (mis)compiled like this: + * if (!tsopt) + * return buf - start; + * reg = stored_return_value_of_tscookie_init; + * if (reg) + * tsopt = tsopt_buf; + * else + * tsopt = NULL; + * ... + * *buf++ = tsopt[1]; + * It creates a dead branch where tsopt is assigned NULL, but the + * verifier can't prove it's dead and blocks the program. + */ + __be32 * volatile tsopt = NULL; + __be32 tsopt_buf[2] = {}; + __u16 ip_len; + __u32 cookie; + __s64 value; + + /* Checksum is not yet verified, but both checksum failure and TCP + * header checks return XDP_DROP, so the order doesn't matter. + */ + if (hdr->tcp->fin || hdr->tcp->rst) + return XDP_DROP; + + /* Issue SYN cookies on allowed ports, drop SYN packets on blocked + * ports. + */ + if (!check_port_allowed(bpf_ntohs(hdr->tcp->dest))) + return XDP_DROP; + + if (hdr->ipv4) { + /* Check the IPv4 and TCP checksums before creating a SYNACK. */ + value = bpf_csum_diff(0, 0, (void *)hdr->ipv4, hdr->ipv4->ihl * 4, 0); + if (value < 0) + return XDP_ABORTED; + if (csum_fold(value) != 0) + return XDP_DROP; /* Bad IPv4 checksum. */ + + value = bpf_csum_diff(0, 0, (void *)hdr->tcp, hdr->tcp_len, 0); + if (value < 0) + return XDP_ABORTED; + if (csum_tcpudp_magic(hdr->ipv4->saddr, hdr->ipv4->daddr, + hdr->tcp_len, IPPROTO_TCP, value) != 0) + return XDP_DROP; /* Bad TCP checksum. */ + + ip_len = sizeof(*hdr->ipv4); + + value = bpf_tcp_raw_gen_syncookie_ipv4(hdr->ipv4, hdr->tcp, + hdr->tcp_len); + } else if (hdr->ipv6) { + /* Check the TCP checksum before creating a SYNACK. */ + value = bpf_csum_diff(0, 0, (void *)hdr->tcp, hdr->tcp_len, 0); + if (value < 0) + return XDP_ABORTED; + if (csum_ipv6_magic(&hdr->ipv6->saddr, &hdr->ipv6->daddr, + hdr->tcp_len, IPPROTO_TCP, value) != 0) + return XDP_DROP; /* Bad TCP checksum. */ + + ip_len = sizeof(*hdr->ipv6); + + value = bpf_tcp_raw_gen_syncookie_ipv6(hdr->ipv6, hdr->tcp, + hdr->tcp_len); + } else { + return XDP_ABORTED; + } + + if (value < 0) + return XDP_ABORTED; + cookie = (__u32)value; + + if (tscookie_init((void *)hdr->tcp, hdr->tcp_len, + &tsopt_buf[0], &tsopt_buf[1], data_end)) + tsopt = tsopt_buf; + + /* Check that there is enough space for a SYNACK. It also covers + * the check that the destination of the __builtin_memmove below + * doesn't overflow. + */ + if (data + sizeof(*hdr->eth) + ip_len + TCP_MAXLEN > data_end) + return XDP_ABORTED; + + if (hdr->ipv4) { + if (hdr->ipv4->ihl * 4 > sizeof(*hdr->ipv4)) { + struct tcphdr *new_tcp_header; + + new_tcp_header = data + sizeof(*hdr->eth) + sizeof(*hdr->ipv4); + __builtin_memmove(new_tcp_header, hdr->tcp, sizeof(*hdr->tcp)); + hdr->tcp = new_tcp_header; + + hdr->ipv4->ihl = sizeof(*hdr->ipv4) / 4; + } + + tcpv4_gen_synack(hdr, cookie, tsopt); + } else if (hdr->ipv6) { + tcpv6_gen_synack(hdr, cookie, tsopt); + } else { + return XDP_ABORTED; + } + + /* Recalculate checksums. */ + hdr->tcp->check = 0; + value = bpf_csum_diff(0, 0, (void *)hdr->tcp, hdr->tcp_len, 0); + if (value < 0) + return XDP_ABORTED; + if (hdr->ipv4) { + hdr->tcp->check = csum_tcpudp_magic(hdr->ipv4->saddr, + hdr->ipv4->daddr, + hdr->tcp_len, + IPPROTO_TCP, + value); + + hdr->ipv4->check = 0; + value = bpf_csum_diff(0, 0, (void *)hdr->ipv4, sizeof(*hdr->ipv4), 0); + if (value < 0) + return XDP_ABORTED; + hdr->ipv4->check = csum_fold(value); + } else if (hdr->ipv6) { + hdr->tcp->check = csum_ipv6_magic(&hdr->ipv6->saddr, + &hdr->ipv6->daddr, + hdr->tcp_len, + IPPROTO_TCP, + value); + } else { + return XDP_ABORTED; + } + + /* Set the new packet size. */ + old_pkt_size = data_end - data; + new_pkt_size = sizeof(*hdr->eth) + ip_len + hdr->tcp->doff * 4; + if (bpf_xdp_adjust_tail(ctx, new_pkt_size - old_pkt_size)) + return XDP_ABORTED; + + values_inc_synacks(); + + return XDP_TX; +} + +static __always_inline int syncookie_handle_ack(struct header_pointers *hdr) +{ + int err; + + if (hdr->tcp->rst) + return XDP_DROP; + + if (hdr->ipv4) + err = bpf_tcp_raw_check_syncookie_ipv4(hdr->ipv4, hdr->tcp); + else if (hdr->ipv6) + err = bpf_tcp_raw_check_syncookie_ipv6(hdr->ipv6, hdr->tcp); + else + return XDP_ABORTED; + if (err) + return XDP_DROP; + + return XDP_PASS; +} + +SEC("xdp") +int syncookie_xdp(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + struct header_pointers hdr; + __s64 value; + int ret; + + struct bpf_ct_opts ct_lookup_opts = { + .netns_id = BPF_F_CURRENT_NETNS, + .l4proto = IPPROTO_TCP, + }; + + ret = tcp_dissect(data, data_end, &hdr); + if (ret != XDP_TX) + return ret; + + ret = tcp_lookup(ctx, &hdr); + if (ret != XDP_TX) + return ret; + + /* Packet is TCP and doesn't belong to an established connection. */ + + if ((hdr.tcp->syn ^ hdr.tcp->ack) != 1) + return XDP_DROP; + + /* Grow the TCP header to TCP_MAXLEN to be able to pass any hdr.tcp_len + * to bpf_tcp_raw_gen_syncookie_ipv{4,6} and pass the verifier. + */ + if (bpf_xdp_adjust_tail(ctx, TCP_MAXLEN - hdr.tcp_len)) + return XDP_ABORTED; + + data_end = (void *)(long)ctx->data_end; + data = (void *)(long)ctx->data; + + if (hdr.ipv4) { + hdr.eth = data; + hdr.ipv4 = (void *)hdr.eth + sizeof(*hdr.eth); + /* IPV4_MAXLEN is needed when calculating checksum. + * At least sizeof(struct iphdr) is needed here to access ihl. + */ + if ((void *)hdr.ipv4 + IPV4_MAXLEN > data_end) + return XDP_ABORTED; + hdr.tcp = (void *)hdr.ipv4 + hdr.ipv4->ihl * 4; + } else if (hdr.ipv6) { + hdr.eth = data; + hdr.ipv6 = (void *)hdr.eth + sizeof(*hdr.eth); + hdr.tcp = (void *)hdr.ipv6 + sizeof(*hdr.ipv6); + } else { + return XDP_ABORTED; + } + + if ((void *)hdr.tcp + TCP_MAXLEN > data_end) + return XDP_ABORTED; + + /* We run out of registers, tcp_len gets spilled to the stack, and the + * verifier forgets its min and max values checked above in tcp_dissect. + */ + hdr.tcp_len = hdr.tcp->doff * 4; + if (hdr.tcp_len < sizeof(*hdr.tcp)) + return XDP_ABORTED; + + return hdr.tcp->syn ? syncookie_handle_syn(&hdr, ctx, data, data_end) : + syncookie_handle_ack(&hdr); +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/xdp_synproxy.c b/tools/testing/selftests/bpf/xdp_synproxy.c new file mode 100644 index 000000000000..4653d4655b5f --- /dev/null +++ b/tools/testing/selftests/bpf/xdp_synproxy.c @@ -0,0 +1,418 @@ +// SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static unsigned int ifindex; +static __u32 attached_prog_id; + +static void noreturn cleanup(int sig) +{ + DECLARE_LIBBPF_OPTS(bpf_xdp_attach_opts, opts); + int prog_fd; + int err; + + if (attached_prog_id == 0) + exit(0); + + prog_fd = bpf_prog_get_fd_by_id(attached_prog_id); + if (prog_fd < 0) { + fprintf(stderr, "Error: bpf_prog_get_fd_by_id: %s\n", strerror(-prog_fd)); + err = bpf_xdp_attach(ifindex, -1, 0, NULL); + if (err < 0) { + fprintf(stderr, "Error: bpf_set_link_xdp_fd: %s\n", strerror(-err)); + fprintf(stderr, "Failed to detach XDP program\n"); + exit(1); + } + } else { + opts.old_prog_fd = prog_fd; + err = bpf_xdp_attach(ifindex, -1, XDP_FLAGS_REPLACE, &opts); + close(prog_fd); + if (err < 0) { + fprintf(stderr, "Error: bpf_set_link_xdp_fd_opts: %s\n", strerror(-err)); + /* Not an error if already replaced by someone else. */ + if (err != -EEXIST) { + fprintf(stderr, "Failed to detach XDP program\n"); + exit(1); + } + } + } + exit(0); +} + +static noreturn void usage(const char *progname) +{ + fprintf(stderr, "Usage: %s [--iface |--prog ] [--mss4 --mss6 --wscale --ttl ] [--ports ,,...] [--single]\n", + progname); + exit(1); +} + +static unsigned long parse_arg_ul(const char *progname, const char *arg, unsigned long limit) +{ + unsigned long res; + char *endptr; + + errno = 0; + res = strtoul(arg, &endptr, 10); + if (errno != 0 || *endptr != '\0' || arg[0] == '\0' || res > limit) + usage(progname); + + return res; +} + +static void parse_options(int argc, char *argv[], unsigned int *ifindex, __u32 *prog_id, + __u64 *tcpipopts, char **ports, bool *single) +{ + static struct option long_options[] = { + { "help", no_argument, NULL, 'h' }, + { "iface", required_argument, NULL, 'i' }, + { "prog", required_argument, NULL, 'x' }, + { "mss4", required_argument, NULL, 4 }, + { "mss6", required_argument, NULL, 6 }, + { "wscale", required_argument, NULL, 'w' }, + { "ttl", required_argument, NULL, 't' }, + { "ports", required_argument, NULL, 'p' }, + { "single", no_argument, NULL, 's' }, + { NULL, 0, NULL, 0 }, + }; + unsigned long mss4, mss6, wscale, ttl; + unsigned int tcpipopts_mask = 0; + + if (argc < 2) + usage(argv[0]); + + *ifindex = 0; + *prog_id = 0; + *tcpipopts = 0; + *ports = NULL; + *single = false; + + while (true) { + int opt; + + opt = getopt_long(argc, argv, "", long_options, NULL); + if (opt == -1) + break; + + switch (opt) { + case 'h': + usage(argv[0]); + break; + case 'i': + *ifindex = if_nametoindex(optarg); + if (*ifindex == 0) + usage(argv[0]); + break; + case 'x': + *prog_id = parse_arg_ul(argv[0], optarg, UINT32_MAX); + if (*prog_id == 0) + usage(argv[0]); + break; + case 4: + mss4 = parse_arg_ul(argv[0], optarg, UINT16_MAX); + tcpipopts_mask |= 1 << 0; + break; + case 6: + mss6 = parse_arg_ul(argv[0], optarg, UINT16_MAX); + tcpipopts_mask |= 1 << 1; + break; + case 'w': + wscale = parse_arg_ul(argv[0], optarg, 14); + tcpipopts_mask |= 1 << 2; + break; + case 't': + ttl = parse_arg_ul(argv[0], optarg, UINT8_MAX); + tcpipopts_mask |= 1 << 3; + break; + case 'p': + *ports = optarg; + break; + case 's': + *single = true; + break; + default: + usage(argv[0]); + } + } + if (optind < argc) + usage(argv[0]); + + if (tcpipopts_mask == 0xf) { + if (mss4 == 0 || mss6 == 0 || wscale == 0 || ttl == 0) + usage(argv[0]); + *tcpipopts = (mss6 << 32) | (ttl << 24) | (wscale << 16) | mss4; + } else if (tcpipopts_mask != 0) { + usage(argv[0]); + } + + if (*ifindex != 0 && *prog_id != 0) + usage(argv[0]); + if (*ifindex == 0 && *prog_id == 0) + usage(argv[0]); +} + +static int syncookie_attach(const char *argv0, unsigned int ifindex) +{ + struct bpf_prog_info info = {}; + __u32 info_len = sizeof(info); + char xdp_filename[PATH_MAX]; + struct bpf_program *prog; + struct bpf_object *obj; + int prog_fd; + int err; + + snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o", argv0); + obj = bpf_object__open_file(xdp_filename, NULL); + err = libbpf_get_error(obj); + if (err < 0) { + fprintf(stderr, "Error: bpf_object__open_file: %s\n", strerror(-err)); + return err; + } + + err = bpf_object__load(obj); + if (err < 0) { + fprintf(stderr, "Error: bpf_object__open_file: %s\n", strerror(-err)); + return err; + } + + prog = bpf_object__find_program_by_name(obj, "syncookie_xdp"); + if (!prog) { + fprintf(stderr, "Error: bpf_object__find_program_by_name: program syncookie_xdp was not found\n"); + return -ENOENT; + } + + prog_fd = bpf_program__fd(prog); + + err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len); + if (err < 0) { + fprintf(stderr, "Error: bpf_obj_get_info_by_fd: %s\n", strerror(-err)); + goto out; + } + attached_prog_id = info.id; + signal(SIGINT, cleanup); + signal(SIGTERM, cleanup); + err = bpf_xdp_attach(ifindex, prog_fd, XDP_FLAGS_UPDATE_IF_NOEXIST, NULL); + if (err < 0) { + fprintf(stderr, "Error: bpf_set_link_xdp_fd: %s\n", strerror(-err)); + signal(SIGINT, SIG_DFL); + signal(SIGTERM, SIG_DFL); + attached_prog_id = 0; + goto out; + } + err = 0; +out: + bpf_object__close(obj); + return err; +} + +static int syncookie_open_bpf_maps(__u32 prog_id, int *values_map_fd, int *ports_map_fd) +{ + struct bpf_prog_info prog_info; + __u32 map_ids[8]; + __u32 info_len; + int prog_fd; + int err; + int i; + + *values_map_fd = -1; + *ports_map_fd = -1; + + prog_fd = bpf_prog_get_fd_by_id(prog_id); + if (prog_fd < 0) { + fprintf(stderr, "Error: bpf_prog_get_fd_by_id: %s\n", strerror(-prog_fd)); + return prog_fd; + } + + prog_info = (struct bpf_prog_info) { + .nr_map_ids = 8, + .map_ids = (__u64)map_ids, + }; + info_len = sizeof(prog_info); + + err = bpf_obj_get_info_by_fd(prog_fd, &prog_info, &info_len); + if (err != 0) { + fprintf(stderr, "Error: bpf_obj_get_info_by_fd: %s\n", strerror(-err)); + goto out; + } + + if (prog_info.type != BPF_PROG_TYPE_XDP) { + fprintf(stderr, "Error: BPF prog type is not BPF_PROG_TYPE_XDP\n"); + err = -ENOENT; + goto out; + } + if (prog_info.nr_map_ids < 2) { + fprintf(stderr, "Error: Found %u BPF maps, expected at least 2\n", + prog_info.nr_map_ids); + err = -ENOENT; + goto out; + } + + for (i = 0; i < prog_info.nr_map_ids; i++) { + struct bpf_map_info map_info = {}; + int map_fd; + + err = bpf_map_get_fd_by_id(map_ids[i]); + if (err < 0) { + fprintf(stderr, "Error: bpf_map_get_fd_by_id: %s\n", strerror(-err)); + goto err_close_map_fds; + } + map_fd = err; + + info_len = sizeof(map_info); + err = bpf_obj_get_info_by_fd(map_fd, &map_info, &info_len); + if (err != 0) { + fprintf(stderr, "Error: bpf_obj_get_info_by_fd: %s\n", strerror(-err)); + close(map_fd); + goto err_close_map_fds; + } + if (strcmp(map_info.name, "values") == 0) { + *values_map_fd = map_fd; + continue; + } + if (strcmp(map_info.name, "allowed_ports") == 0) { + *ports_map_fd = map_fd; + continue; + } + close(map_fd); + } + + if (*values_map_fd != -1 && *ports_map_fd != -1) { + err = 0; + goto out; + } + + err = -ENOENT; + +err_close_map_fds: + if (*values_map_fd != -1) + close(*values_map_fd); + if (*ports_map_fd != -1) + close(*ports_map_fd); + *values_map_fd = -1; + *ports_map_fd = -1; + +out: + close(prog_fd); + return err; +} + +int main(int argc, char *argv[]) +{ + int values_map_fd, ports_map_fd; + __u64 tcpipopts; + bool firstiter; + __u64 prevcnt; + __u32 prog_id; + char *ports; + bool single; + int err = 0; + + parse_options(argc, argv, &ifindex, &prog_id, &tcpipopts, &ports, &single); + + if (prog_id == 0) { + err = bpf_xdp_query_id(ifindex, 0, &prog_id); + if (err < 0) { + fprintf(stderr, "Error: bpf_get_link_xdp_id: %s\n", strerror(-err)); + goto out; + } + if (prog_id == 0) { + err = syncookie_attach(argv[0], ifindex); + if (err < 0) + goto out; + prog_id = attached_prog_id; + } + } + + err = syncookie_open_bpf_maps(prog_id, &values_map_fd, &ports_map_fd); + if (err < 0) + goto out; + + if (ports) { + __u16 port_last = 0; + __u32 port_idx = 0; + char *p = ports; + + fprintf(stderr, "Replacing allowed ports\n"); + + while (p && *p != '\0') { + char *token = strsep(&p, ","); + __u16 port; + + port = parse_arg_ul(argv[0], token, UINT16_MAX); + err = bpf_map_update_elem(ports_map_fd, &port_idx, &port, BPF_ANY); + if (err != 0) { + fprintf(stderr, "Error: bpf_map_update_elem: %s\n", strerror(-err)); + fprintf(stderr, "Failed to add port %u (index %u)\n", + port, port_idx); + goto out_close_maps; + } + fprintf(stderr, "Added port %u\n", port); + port_idx++; + } + err = bpf_map_update_elem(ports_map_fd, &port_idx, &port_last, BPF_ANY); + if (err != 0) { + fprintf(stderr, "Error: bpf_map_update_elem: %s\n", strerror(-err)); + fprintf(stderr, "Failed to add the terminator value 0 (index %u)\n", + port_idx); + goto out_close_maps; + } + } + + if (tcpipopts) { + __u32 key = 0; + + fprintf(stderr, "Replacing TCP/IP options\n"); + + err = bpf_map_update_elem(values_map_fd, &key, &tcpipopts, BPF_ANY); + if (err != 0) { + fprintf(stderr, "Error: bpf_map_update_elem: %s\n", strerror(-err)); + goto out_close_maps; + } + } + + if ((ports || tcpipopts) && attached_prog_id == 0 && !single) + goto out_close_maps; + + prevcnt = 0; + firstiter = true; + while (true) { + __u32 key = 1; + __u64 value; + + err = bpf_map_lookup_elem(values_map_fd, &key, &value); + if (err != 0) { + fprintf(stderr, "Error: bpf_map_lookup_elem: %s\n", strerror(-err)); + goto out_close_maps; + } + if (firstiter) { + prevcnt = value; + firstiter = false; + } + if (single) { + printf("Total SYNACKs generated: %llu\n", value); + break; + } + printf("SYNACKs generated: %llu (total %llu)\n", value - prevcnt, value); + prevcnt = value; + sleep(1); + } + +out_close_maps: + close(values_map_fd); + close(ports_map_fd); +out: + return err == 0 ? 0 : 1; +} From patchwork Wed Jun 15 13:48:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Mikityanskiy X-Patchwork-Id: 582065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D46BC43334 for ; Wed, 15 Jun 2022 13:49:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349308AbiFONtw (ORCPT ); Wed, 15 Jun 2022 09:49:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353431AbiFONts (ORCPT ); Wed, 15 Jun 2022 09:49:48 -0400 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2083.outbound.protection.outlook.com [40.107.220.83]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF20B3EF3E; Wed, 15 Jun 2022 06:49:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LeMHTXXQxeyU1R7Ua96vk6pZM4OPcBenQiJWUYygiOLiUqZ5duiARCDclzPm0Qrx4r1JYGBOnFJ4RJVDbwmH7yxRO/A9mxWOZ6MHK4AjIQBWakMUxDGJ3/nrndkMcr+aVHk8b7TlU7gsrji9xjteECKm0MyTLCuF0edXUvZCGMMkVBq0DiqTniRcfVJas4KaDH3mdawCyTC9IVSejZonJwh+Q0JioNhSaMDKCIxzgcdnwD+C21Pe8h/h5Fge53xz6kwHJfGxlK/MtE29gm1HY4fZErkpUosMgQDWcxYaHztxO4O+OHaKghRIk+Xo0pp6zgao8n/jKayuLF0YimeosA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BEMr/HKgFtzZGKvvxJi8s4nGXUBeoHW/QqXDtkseprg=; b=dPpUA9HFz2e7pVpMQBzTmmaLGsZe4TL3PWd/mziwsPz6J96yPFqQDbMP/IclUCPk2Jm/pIw8prGyh6weEQAAmZNyVsaiqEkksCv+5KiC1Ms6HdCWZyb/mHHZiRFdVgF6WVv4IlFXwu7LUp02l7JxvoT5ITUqfBhuBAYnzebtwUe2Or0Hp1m3o3DAPxrsTMUF0f9W4YdeamdVbAV0KRa/ovmhSk4Gj3ffdRUPLN23XQAHiem4GXUoNlKONFQ2sUQihhJH9UzA+IQbUK3lZb0fB8lEzj3UG/2naJUlrW5fd9PEwg9xU06huYaSiTEYGVrPen6Tjy66c/EOksqyjb9qtw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.234) smtp.rcpttodomain=gmail.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BEMr/HKgFtzZGKvvxJi8s4nGXUBeoHW/QqXDtkseprg=; b=LAjQhsSgJqYaVtriL2x6em5UsPgj4ySmsY5+XYDdZQq6YttNzKWS5CIPETAz/3PvqGkVtmkUFfQ3XhxIrNFWPBbdo5rL8/BmBizZ7CP3R+9thodH+M2Q7777sAs3ftoch2w8HqJAlJKJUsiUUpwh722ZY21KHZshyBzvgC5PZUK6Me7DZ6o+SBazKaLOnTwTs43VrFRBw9UnzNZNDvEg6i22EvfHk18YUbbyLWe1K5K2fej2laKb2Vg2xoCKrvOplXwTRJXxvFpqCk++74NES5w7JgVahuQ3hx9EyXDONaHHLQOdr65Xx/kKDqVguZK81zR98ZqoybAswfyvhQvUNg== Received: from MW4PR03CA0050.namprd03.prod.outlook.com (2603:10b6:303:8e::25) by CH2PR12MB5563.namprd12.prod.outlook.com (2603:10b6:610:6a::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.13; Wed, 15 Jun 2022 13:49:41 +0000 Received: from CO1NAM11FT033.eop-nam11.prod.protection.outlook.com (2603:10b6:303:8e:cafe::2d) by MW4PR03CA0050.outlook.office365.com (2603:10b6:303:8e::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.14 via Frontend Transport; Wed, 15 Jun 2022 13:49:41 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.234) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.234 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.234; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.234) by CO1NAM11FT033.mail.protection.outlook.com (10.13.174.247) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5353.14 via Frontend Transport; Wed, 15 Jun 2022 13:49:41 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by DRHQMAIL101.nvidia.com (10.27.9.10) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Wed, 15 Jun 2022 13:49:40 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 15 Jun 2022 06:49:40 -0700 Received: from vdi.nvidia.com (10.127.8.12) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Wed, 15 Jun 2022 06:49:34 -0700 From: Maxim Mikityanskiy To: , Alexei Starovoitov , "Daniel Borkmann" , Andrii Nakryiko , CC: Tariq Toukan , Martin KaFai Lau , "Song Liu" , Yonghong Song , John Fastabend , KP Singh , "David S. Miller" , Jakub Kicinski , Eric Dumazet , Hideaki YOSHIFUJI , "David Ahern" , Shuah Khan , "Jesper Dangaard Brouer" , Nathan Chancellor , "Nick Desaulniers" , Joe Stringer , "Florent Revest" , , =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , "Kumar Kartikeya Dwivedi" , Florian Westphal , , Maxim Mikityanskiy Subject: [PATCH bpf-next v10 6/6] selftests/bpf: Add selftests for raw syncookie helpers in TC mode Date: Wed, 15 Jun 2022 16:48:47 +0300 Message-ID: <20220615134847.3753567-7-maximmi@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220615134847.3753567-1-maximmi@nvidia.com> References: <20220615134847.3753567-1-maximmi@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 0b709d30-1afa-4408-635d-08da4ed5e59b X-MS-TrafficTypeDiagnostic: CH2PR12MB5563:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: bBU9wHZV/tOkRRAhj4QbOiGbSrOphWmgBNh/c1QJUaHSHIAZ4RB696/jdPprchPcR2ylE51+bNNp/PfSfOMd0YPHQKcRvwkR4s9DvvX6XHtjNImg+0WDwNKsWeYmZGQKBlpetFlaFao6YlR4Hjxi9cqC1C8ui6wDbo3txcWqa6/In4sij4aNsH87/Fsu2O9qg1Wv6Cj1Mw180YqOcStIYEzBoieVcW5JsnG4ivQ17ymnfKAPT3nuHOzI71pu2YRgs294QoV41OvWBGSrynNTkStosL5p9tBRTXaSkSqDfMryE5DCx7O/ItVVqTCtivhs5ySb16ENWxrPovw8iMPRpL0spPrtA20wX8JU5PtwyVUfePDL1awNWKPN2VhUFEkj2aACwoBErg9XntDRXuLMkgB71br4+SvgZjQCd13vF4wFQRyZ8oLpwc1/eWPJPfELonNYQx9XXiWKnJtYissm+4pA7oPaDNllfSixMTRYEjbKx4rptlIAAyape7YEpPlHpE00fR51AlkGx/WuEuk3zSK3HAxlZ5MxinSgs0bKDATAMpdutrYTN6+/oCENNtq0tsVrNFSUHOTeg3BTF25Ngl89WIOHRn2ijZgo5NNa6y60FdZjJBeHbanNJ2e9Qzz4jTUO20bJxOVGH+0cP6eNSXVsoi4Rd6Jn2IKBX7cnqFebINLZV/GOXroItnPUIATjzuIHnyx/LNQD/INsx/1VnQ== X-Forefront-Antispam-Report: CIP:12.22.5.234; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(13230016)(4636009)(40470700004)(46966006)(36840700001)(8676002)(36860700001)(82310400005)(107886003)(36756003)(86362001)(1076003)(83380400001)(7416002)(8936002)(2616005)(30864003)(47076005)(70206006)(54906003)(2906002)(316002)(4326008)(5660300002)(110136005)(6666004)(336012)(70586007)(356005)(7696005)(426003)(186003)(40460700003)(81166007)(26005)(508600001)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jun 2022 13:49:41.2263 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0b709d30-1afa-4408-635d-08da4ed5e59b X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.234]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT033.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB5563 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org This commit extends selftests for the new BPF helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} to also test the TC BPF functionality added in the previous commit. Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan --- .../selftests/bpf/prog_tests/xdp_synproxy.c | 55 +++++-- .../selftests/bpf/progs/xdp_synproxy_kern.c | 142 +++++++++++++----- tools/testing/selftests/bpf/xdp_synproxy.c | 96 +++++++++--- 3 files changed, 224 insertions(+), 69 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c b/tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c index d9ee884c2a2b..fb77a123fe89 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause /* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ +#define _GNU_SOURCE #include #include #include @@ -12,9 +13,11 @@ goto out; \ }) -#define SYS_OUT(cmd) ({ \ - FILE *f = popen((cmd), "r"); \ - if (!ASSERT_OK_PTR(f, (cmd))) \ +#define SYS_OUT(cmd, ...) ({ \ + char buf[1024]; \ + snprintf(buf, sizeof(buf), (cmd), ##__VA_ARGS__); \ + FILE *f = popen(buf, "r"); \ + if (!ASSERT_OK_PTR(f, buf)) \ goto out; \ f; \ }) @@ -57,9 +60,10 @@ static bool expect_str(char *buf, size_t size, const char *str, const char *name return ok; } -void test_xdp_synproxy(void) +static void test_synproxy(bool xdp) { int server_fd = -1, client_fd = -1, accept_fd = -1; + char *prog_id, *prog_id_end; struct nstoken *ns = NULL; FILE *ctrl_file = NULL; char buf[CMD_OUT_BUF_SIZE]; @@ -76,8 +80,9 @@ void test_xdp_synproxy(void) * checksums and drops packets. */ SYS("ethtool -K tmp0 tx off"); - /* Workaround required for veth. */ - SYS("ip link set tmp0 xdp object xdp_dummy.o section xdp 2> /dev/null"); + if (xdp) + /* Workaround required for veth. */ + SYS("ip link set tmp0 xdp object xdp_dummy.o section xdp 2> /dev/null"); ns = open_netns("synproxy"); if (!ASSERT_OK_PTR(ns, "setns")) @@ -97,14 +102,34 @@ void test_xdp_synproxy(void) SYS("iptables -t filter -A INPUT \ -i tmp1 -m state --state INVALID -j DROP"); - ctrl_file = SYS_OUT("./xdp_synproxy --iface tmp1 --ports 8080 --single \ - --mss4 1460 --mss6 1440 --wscale 7 --ttl 64"); + ctrl_file = SYS_OUT("./xdp_synproxy --iface tmp1 --ports 8080 \ + --single --mss4 1460 --mss6 1440 \ + --wscale 7 --ttl 64%s", xdp ? "" : " --tc"); size = fread(buf, 1, sizeof(buf), ctrl_file); pclose(ctrl_file); if (!expect_str(buf, size, "Total SYNACKs generated: 0\n", "initial SYNACKs")) goto out; + if (!xdp) { + ctrl_file = SYS_OUT("tc filter show dev tmp1 ingress"); + size = fread(buf, 1, sizeof(buf), ctrl_file); + pclose(ctrl_file); + prog_id = memmem(buf, size, " id ", 4); + if (!ASSERT_OK_PTR(prog_id, "find prog id")) + goto out; + prog_id += 4; + if (!ASSERT_LT(prog_id, buf + size, "find prog id begin")) + goto out; + prog_id_end = prog_id; + while (prog_id_end < buf + size && *prog_id_end >= '0' && + *prog_id_end <= '9') + prog_id_end++; + if (!ASSERT_LT(prog_id_end, buf + size, "find prog id end")) + goto out; + *prog_id_end = '\0'; + } + server_fd = start_server(AF_INET, SOCK_STREAM, "198.18.0.2", 8080, 0); if (!ASSERT_GE(server_fd, 0, "start_server")) goto out; @@ -124,7 +149,11 @@ void test_xdp_synproxy(void) if (!ASSERT_OK_PTR(ns, "setns")) goto out; - ctrl_file = SYS_OUT("./xdp_synproxy --iface tmp1 --single"); + if (xdp) + ctrl_file = SYS_OUT("./xdp_synproxy --iface tmp1 --single"); + else + ctrl_file = SYS_OUT("./xdp_synproxy --prog %s --single", + prog_id); size = fread(buf, 1, sizeof(buf), ctrl_file); pclose(ctrl_file); if (!expect_str(buf, size, "Total SYNACKs generated: 1\n", @@ -144,3 +173,11 @@ void test_xdp_synproxy(void) system("ip link del tmp0"); system("ip netns del synproxy"); } + +void test_xdp_synproxy(void) +{ + if (test__start_subtest("xdp")) + test_synproxy(true); + if (test__start_subtest("tc")) + test_synproxy(false); +} diff --git a/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c b/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c index 53b9865276a4..9fd62e94b5e6 100644 --- a/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c +++ b/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c @@ -7,6 +7,9 @@ #include #include +#define TC_ACT_OK 0 +#define TC_ACT_SHOT 2 + #define NSEC_PER_SEC 1000000000L #define ETH_ALEN 6 @@ -80,6 +83,12 @@ extern struct nf_conn *bpf_xdp_ct_lookup(struct xdp_md *xdp_ctx, struct bpf_ct_opts *opts, __u32 len_opts) __ksym; +extern struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *skb_ctx, + struct bpf_sock_tuple *bpf_tuple, + u32 len_tuple, + struct bpf_ct_opts *opts, + u32 len_opts) __ksym; + extern void bpf_ct_release(struct nf_conn *ct) __ksym; static __always_inline void swap_eth_addr(__u8 *a, __u8 *b) @@ -382,7 +391,7 @@ static __always_inline int tcp_dissect(void *data, void *data_end, return XDP_TX; } -static __always_inline int tcp_lookup(struct xdp_md *ctx, struct header_pointers *hdr) +static __always_inline int tcp_lookup(void *ctx, struct header_pointers *hdr, bool xdp) { struct bpf_ct_opts ct_lookup_opts = { .netns_id = BPF_F_CURRENT_NETNS, @@ -416,7 +425,10 @@ static __always_inline int tcp_lookup(struct xdp_md *ctx, struct header_pointers */ return XDP_ABORTED; } - ct = bpf_xdp_ct_lookup(ctx, &tup, tup_size, &ct_lookup_opts, sizeof(ct_lookup_opts)); + if (xdp) + ct = bpf_xdp_ct_lookup(ctx, &tup, tup_size, &ct_lookup_opts, sizeof(ct_lookup_opts)); + else + ct = bpf_skb_ct_lookup(ctx, &tup, tup_size, &ct_lookup_opts, sizeof(ct_lookup_opts)); if (ct) { unsigned long status = ct->status; @@ -529,8 +541,9 @@ static __always_inline void tcpv6_gen_synack(struct header_pointers *hdr, } static __always_inline int syncookie_handle_syn(struct header_pointers *hdr, - struct xdp_md *ctx, - void *data, void *data_end) + void *ctx, + void *data, void *data_end, + bool xdp) { __u32 old_pkt_size, new_pkt_size; /* Unlike clang 10, clang 11 and 12 generate code that doesn't pass the @@ -666,8 +679,13 @@ static __always_inline int syncookie_handle_syn(struct header_pointers *hdr, /* Set the new packet size. */ old_pkt_size = data_end - data; new_pkt_size = sizeof(*hdr->eth) + ip_len + hdr->tcp->doff * 4; - if (bpf_xdp_adjust_tail(ctx, new_pkt_size - old_pkt_size)) - return XDP_ABORTED; + if (xdp) { + if (bpf_xdp_adjust_tail(ctx, new_pkt_size - old_pkt_size)) + return XDP_ABORTED; + } else { + if (bpf_skb_change_tail(ctx, new_pkt_size, 0)) + return XDP_ABORTED; + } values_inc_synacks(); @@ -693,71 +711,123 @@ static __always_inline int syncookie_handle_ack(struct header_pointers *hdr) return XDP_PASS; } -SEC("xdp") -int syncookie_xdp(struct xdp_md *ctx) +static __always_inline int syncookie_part1(void *ctx, void *data, void *data_end, + struct header_pointers *hdr, bool xdp) { - void *data_end = (void *)(long)ctx->data_end; - void *data = (void *)(long)ctx->data; - struct header_pointers hdr; - __s64 value; - int ret; - struct bpf_ct_opts ct_lookup_opts = { .netns_id = BPF_F_CURRENT_NETNS, .l4proto = IPPROTO_TCP, }; + int ret; - ret = tcp_dissect(data, data_end, &hdr); + ret = tcp_dissect(data, data_end, hdr); if (ret != XDP_TX) return ret; - ret = tcp_lookup(ctx, &hdr); + ret = tcp_lookup(ctx, hdr, xdp); if (ret != XDP_TX) return ret; /* Packet is TCP and doesn't belong to an established connection. */ - if ((hdr.tcp->syn ^ hdr.tcp->ack) != 1) + if ((hdr->tcp->syn ^ hdr->tcp->ack) != 1) return XDP_DROP; - /* Grow the TCP header to TCP_MAXLEN to be able to pass any hdr.tcp_len + /* Grow the TCP header to TCP_MAXLEN to be able to pass any hdr->tcp_len * to bpf_tcp_raw_gen_syncookie_ipv{4,6} and pass the verifier. */ - if (bpf_xdp_adjust_tail(ctx, TCP_MAXLEN - hdr.tcp_len)) - return XDP_ABORTED; + if (xdp) { + if (bpf_xdp_adjust_tail(ctx, TCP_MAXLEN - hdr->tcp_len)) + return XDP_ABORTED; + } else { + /* Without volatile the verifier throws this error: + * R9 32-bit pointer arithmetic prohibited + */ + volatile u64 old_len = data_end - data; - data_end = (void *)(long)ctx->data_end; - data = (void *)(long)ctx->data; + if (bpf_skb_change_tail(ctx, old_len + TCP_MAXLEN - hdr->tcp_len, 0)) + return XDP_ABORTED; + } + + return XDP_TX; +} - if (hdr.ipv4) { - hdr.eth = data; - hdr.ipv4 = (void *)hdr.eth + sizeof(*hdr.eth); +static __always_inline int syncookie_part2(void *ctx, void *data, void *data_end, + struct header_pointers *hdr, bool xdp) +{ + if (hdr->ipv4) { + hdr->eth = data; + hdr->ipv4 = (void *)hdr->eth + sizeof(*hdr->eth); /* IPV4_MAXLEN is needed when calculating checksum. * At least sizeof(struct iphdr) is needed here to access ihl. */ - if ((void *)hdr.ipv4 + IPV4_MAXLEN > data_end) + if ((void *)hdr->ipv4 + IPV4_MAXLEN > data_end) return XDP_ABORTED; - hdr.tcp = (void *)hdr.ipv4 + hdr.ipv4->ihl * 4; - } else if (hdr.ipv6) { - hdr.eth = data; - hdr.ipv6 = (void *)hdr.eth + sizeof(*hdr.eth); - hdr.tcp = (void *)hdr.ipv6 + sizeof(*hdr.ipv6); + hdr->tcp = (void *)hdr->ipv4 + hdr->ipv4->ihl * 4; + } else if (hdr->ipv6) { + hdr->eth = data; + hdr->ipv6 = (void *)hdr->eth + sizeof(*hdr->eth); + hdr->tcp = (void *)hdr->ipv6 + sizeof(*hdr->ipv6); } else { return XDP_ABORTED; } - if ((void *)hdr.tcp + TCP_MAXLEN > data_end) + if ((void *)hdr->tcp + TCP_MAXLEN > data_end) return XDP_ABORTED; /* We run out of registers, tcp_len gets spilled to the stack, and the * verifier forgets its min and max values checked above in tcp_dissect. */ - hdr.tcp_len = hdr.tcp->doff * 4; - if (hdr.tcp_len < sizeof(*hdr.tcp)) + hdr->tcp_len = hdr->tcp->doff * 4; + if (hdr->tcp_len < sizeof(*hdr->tcp)) return XDP_ABORTED; - return hdr.tcp->syn ? syncookie_handle_syn(&hdr, ctx, data, data_end) : - syncookie_handle_ack(&hdr); + return hdr->tcp->syn ? syncookie_handle_syn(hdr, ctx, data, data_end, xdp) : + syncookie_handle_ack(hdr); +} + +SEC("xdp") +int syncookie_xdp(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + struct header_pointers hdr; + int ret; + + ret = syncookie_part1(ctx, data, data_end, &hdr, true); + if (ret != XDP_TX) + return ret; + + data_end = (void *)(long)ctx->data_end; + data = (void *)(long)ctx->data; + + return syncookie_part2(ctx, data, data_end, &hdr, true); +} + +SEC("tc") +int syncookie_tc(struct __sk_buff *skb) +{ + void *data_end = (void *)(long)skb->data_end; + void *data = (void *)(long)skb->data; + struct header_pointers hdr; + int ret; + + ret = syncookie_part1(skb, data, data_end, &hdr, false); + if (ret != XDP_TX) + return ret == XDP_PASS ? TC_ACT_OK : TC_ACT_SHOT; + + data_end = (void *)(long)skb->data_end; + data = (void *)(long)skb->data; + + ret = syncookie_part2(skb, data, data_end, &hdr, false); + switch (ret) { + case XDP_PASS: + return TC_ACT_OK; + case XDP_TX: + return bpf_redirect(skb->ifindex, 0); + default: + return TC_ACT_SHOT; + } } char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/xdp_synproxy.c b/tools/testing/selftests/bpf/xdp_synproxy.c index 4653d4655b5f..d874ddfb39c4 100644 --- a/tools/testing/selftests/bpf/xdp_synproxy.c +++ b/tools/testing/selftests/bpf/xdp_synproxy.c @@ -18,16 +18,31 @@ static unsigned int ifindex; static __u32 attached_prog_id; +static bool attached_tc; static void noreturn cleanup(int sig) { - DECLARE_LIBBPF_OPTS(bpf_xdp_attach_opts, opts); + LIBBPF_OPTS(bpf_xdp_attach_opts, opts); int prog_fd; int err; if (attached_prog_id == 0) exit(0); + if (attached_tc) { + LIBBPF_OPTS(bpf_tc_hook, hook, + .ifindex = ifindex, + .attach_point = BPF_TC_INGRESS); + + err = bpf_tc_hook_destroy(&hook); + if (err < 0) { + fprintf(stderr, "Error: bpf_tc_hook_destroy: %s\n", strerror(-err)); + fprintf(stderr, "Failed to destroy the TC hook\n"); + exit(1); + } + exit(0); + } + prog_fd = bpf_prog_get_fd_by_id(attached_prog_id); if (prog_fd < 0) { fprintf(stderr, "Error: bpf_prog_get_fd_by_id: %s\n", strerror(-prog_fd)); @@ -55,7 +70,7 @@ static void noreturn cleanup(int sig) static noreturn void usage(const char *progname) { - fprintf(stderr, "Usage: %s [--iface |--prog ] [--mss4 --mss6 --wscale --ttl ] [--ports ,,...] [--single]\n", + fprintf(stderr, "Usage: %s [--iface |--prog ] [--mss4 --mss6 --wscale --ttl ] [--ports ,,...] [--single] [--tc]\n", progname); exit(1); } @@ -74,7 +89,7 @@ static unsigned long parse_arg_ul(const char *progname, const char *arg, unsigne } static void parse_options(int argc, char *argv[], unsigned int *ifindex, __u32 *prog_id, - __u64 *tcpipopts, char **ports, bool *single) + __u64 *tcpipopts, char **ports, bool *single, bool *tc) { static struct option long_options[] = { { "help", no_argument, NULL, 'h' }, @@ -86,6 +101,7 @@ static void parse_options(int argc, char *argv[], unsigned int *ifindex, __u32 * { "ttl", required_argument, NULL, 't' }, { "ports", required_argument, NULL, 'p' }, { "single", no_argument, NULL, 's' }, + { "tc", no_argument, NULL, 'c' }, { NULL, 0, NULL, 0 }, }; unsigned long mss4, mss6, wscale, ttl; @@ -143,6 +159,9 @@ static void parse_options(int argc, char *argv[], unsigned int *ifindex, __u32 * case 's': *single = true; break; + case 'c': + *tc = true; + break; default: usage(argv[0]); } @@ -164,7 +183,7 @@ static void parse_options(int argc, char *argv[], unsigned int *ifindex, __u32 * usage(argv[0]); } -static int syncookie_attach(const char *argv0, unsigned int ifindex) +static int syncookie_attach(const char *argv0, unsigned int ifindex, bool tc) { struct bpf_prog_info info = {}; __u32 info_len = sizeof(info); @@ -188,9 +207,9 @@ static int syncookie_attach(const char *argv0, unsigned int ifindex) return err; } - prog = bpf_object__find_program_by_name(obj, "syncookie_xdp"); + prog = bpf_object__find_program_by_name(obj, tc ? "syncookie_tc" : "syncookie_xdp"); if (!prog) { - fprintf(stderr, "Error: bpf_object__find_program_by_name: program syncookie_xdp was not found\n"); + fprintf(stderr, "Error: bpf_object__find_program_by_name: program was not found\n"); return -ENOENT; } @@ -201,21 +220,50 @@ static int syncookie_attach(const char *argv0, unsigned int ifindex) fprintf(stderr, "Error: bpf_obj_get_info_by_fd: %s\n", strerror(-err)); goto out; } + attached_tc = tc; attached_prog_id = info.id; signal(SIGINT, cleanup); signal(SIGTERM, cleanup); - err = bpf_xdp_attach(ifindex, prog_fd, XDP_FLAGS_UPDATE_IF_NOEXIST, NULL); - if (err < 0) { - fprintf(stderr, "Error: bpf_set_link_xdp_fd: %s\n", strerror(-err)); - signal(SIGINT, SIG_DFL); - signal(SIGTERM, SIG_DFL); - attached_prog_id = 0; - goto out; + if (tc) { + LIBBPF_OPTS(bpf_tc_hook, hook, + .ifindex = ifindex, + .attach_point = BPF_TC_INGRESS); + LIBBPF_OPTS(bpf_tc_opts, opts, + .handle = 1, + .priority = 1, + .prog_fd = prog_fd); + + err = bpf_tc_hook_create(&hook); + if (err < 0) { + fprintf(stderr, "Error: bpf_tc_hook_create: %s\n", + strerror(-err)); + goto fail; + } + err = bpf_tc_attach(&hook, &opts); + if (err < 0) { + fprintf(stderr, "Error: bpf_tc_attach: %s\n", + strerror(-err)); + goto fail; + } + + } else { + err = bpf_xdp_attach(ifindex, prog_fd, + XDP_FLAGS_UPDATE_IF_NOEXIST, NULL); + if (err < 0) { + fprintf(stderr, "Error: bpf_set_link_xdp_fd: %s\n", + strerror(-err)); + goto fail; + } } err = 0; out: bpf_object__close(obj); return err; +fail: + signal(SIGINT, SIG_DFL); + signal(SIGTERM, SIG_DFL); + attached_prog_id = 0; + goto out; } static int syncookie_open_bpf_maps(__u32 prog_id, int *values_map_fd, int *ports_map_fd) @@ -248,11 +296,6 @@ static int syncookie_open_bpf_maps(__u32 prog_id, int *values_map_fd, int *ports goto out; } - if (prog_info.type != BPF_PROG_TYPE_XDP) { - fprintf(stderr, "Error: BPF prog type is not BPF_PROG_TYPE_XDP\n"); - err = -ENOENT; - goto out; - } if (prog_info.nr_map_ids < 2) { fprintf(stderr, "Error: Found %u BPF maps, expected at least 2\n", prog_info.nr_map_ids); @@ -319,17 +362,22 @@ int main(int argc, char *argv[]) char *ports; bool single; int err = 0; + bool tc; - parse_options(argc, argv, &ifindex, &prog_id, &tcpipopts, &ports, &single); + parse_options(argc, argv, &ifindex, &prog_id, &tcpipopts, &ports, + &single, &tc); if (prog_id == 0) { - err = bpf_xdp_query_id(ifindex, 0, &prog_id); - if (err < 0) { - fprintf(stderr, "Error: bpf_get_link_xdp_id: %s\n", strerror(-err)); - goto out; + if (!tc) { + err = bpf_xdp_query_id(ifindex, 0, &prog_id); + if (err < 0) { + fprintf(stderr, "Error: bpf_get_link_xdp_id: %s\n", + strerror(-err)); + goto out; + } } if (prog_id == 0) { - err = syncookie_attach(argv[0], ifindex); + err = syncookie_attach(argv[0], ifindex, tc); if (err < 0) goto out; prog_id = attached_prog_id;