From patchwork Thu Sep 24 18:21:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Borkmann X-Patchwork-Id: 260259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05C34C4727D for ; Thu, 24 Sep 2020 18:21:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BC5F22399A for ; Thu, 24 Sep 2020 18:21:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728756AbgIXSVj (ORCPT ); Thu, 24 Sep 2020 14:21:39 -0400 Received: from www62.your-server.de ([213.133.104.62]:39316 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728691AbgIXSVi (ORCPT ); Thu, 24 Sep 2020 14:21:38 -0400 Received: from 75.57.196.178.dynamic.wline.res.cust.swisscom.ch ([178.196.57.75] helo=localhost) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from ) id 1kLVs8-0003zv-Ej; Thu, 24 Sep 2020 20:21:36 +0200 From: Daniel Borkmann To: ast@kernel.org Cc: daniel@iogearbox.net, john.fastabend@gmail.com, netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next 1/6] bpf: add classid helper only based on skb->sk Date: Thu, 24 Sep 2020 20:21:22 +0200 Message-Id: <2e761d23d591a9536eaa3ecd4be8d78c99f00964.1600967205.git.daniel@iogearbox.net> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.102.4/25937/Thu Sep 24 15:53:11 2020) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Similarly to 5a52ae4e32a6 ("bpf: Allow to retrieve cgroup v1 classid from v2 hooks"), add a helper to retrieve cgroup v1 classid solely based on the skb->sk, so it can be used as key as part of BPF map lookups out of tc from host ns, in particular given the skb->sk is retained these days when crossing net ns thanks to 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb."). This is similar to bpf_skb_cgroup_id() which implements the same for v2. Kubernetes ecosystem is still operating on v1 however, hence net_cls needs to be used there until this can be dropped in with the v2 helper of bpf_skb_cgroup_id(). Signed-off-by: Daniel Borkmann --- include/uapi/linux/bpf.h | 10 ++++++++++ net/core/filter.c | 21 +++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 10 ++++++++++ 3 files changed, 41 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index a22812561064..48ecf246d047 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3586,6 +3586,15 @@ union bpf_attr { * the data in *dst*. This is a wrapper of **copy_from_user**\ (). * Return * 0 on success, or a negative error in case of failure. + * + * u64 bpf_skb_cgroup_classid(struct sk_buff *skb) + * Description + * See **bpf_get_cgroup_classid**\ () for the main description. + * This helper differs from **bpf_get_cgroup_classid**\ () in that + * the cgroup v1 net_cls class is retrieved only from the *skb*'s + * associated socket instead of the current process. + * Return + * The id is returned or 0 in case the id could not be retrieved. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3737,6 +3746,7 @@ union bpf_attr { FN(inode_storage_delete), \ FN(d_path), \ FN(copy_from_user), \ + FN(skb_cgroup_classid), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/net/core/filter.c b/net/core/filter.c index 6014e5f40c58..0f913755bcba 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2704,6 +2704,23 @@ static const struct bpf_func_proto bpf_get_cgroup_classid_curr_proto = { .gpl_only = false, .ret_type = RET_INTEGER, }; + +BPF_CALL_1(bpf_skb_cgroup_classid, const struct sk_buff *, skb) +{ + struct sock *sk = skb_to_full_sk(skb); + + if (!sk || !sk_fullsock(sk)) + return 0; + + return sock_cgroup_classid(&sk->sk_cgrp_data); +} + +static const struct bpf_func_proto bpf_skb_cgroup_classid_proto = { + .func = bpf_skb_cgroup_classid, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, +}; #endif BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb) @@ -6770,6 +6787,10 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_skb_get_xfrm_state: return &bpf_skb_get_xfrm_state_proto; #endif +#ifdef CONFIG_CGROUP_NET_CLASSID + case BPF_FUNC_skb_cgroup_classid: + return &bpf_skb_cgroup_classid_proto; +#endif #ifdef CONFIG_SOCK_CGROUP_DATA case BPF_FUNC_skb_cgroup_id: return &bpf_skb_cgroup_id_proto; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index a22812561064..48ecf246d047 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3586,6 +3586,15 @@ union bpf_attr { * the data in *dst*. This is a wrapper of **copy_from_user**\ (). * Return * 0 on success, or a negative error in case of failure. + * + * u64 bpf_skb_cgroup_classid(struct sk_buff *skb) + * Description + * See **bpf_get_cgroup_classid**\ () for the main description. + * This helper differs from **bpf_get_cgroup_classid**\ () in that + * the cgroup v1 net_cls class is retrieved only from the *skb*'s + * associated socket instead of the current process. + * Return + * The id is returned or 0 in case the id could not be retrieved. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3737,6 +3746,7 @@ union bpf_attr { FN(inode_storage_delete), \ FN(d_path), \ FN(copy_from_user), \ + FN(skb_cgroup_classid), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Thu Sep 24 18:21:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Borkmann X-Patchwork-Id: 260258 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED84CC4727D for ; Thu, 24 Sep 2020 18:21:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BD6B32311B for ; Thu, 24 Sep 2020 18:21:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728778AbgIXSVq (ORCPT ); Thu, 24 Sep 2020 14:21:46 -0400 Received: from www62.your-server.de ([213.133.104.62]:39324 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728704AbgIXSVj (ORCPT ); Thu, 24 Sep 2020 14:21:39 -0400 Received: from 75.57.196.178.dynamic.wline.res.cust.swisscom.ch ([178.196.57.75] helo=localhost) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from ) id 1kLVs8-000402-RH; Thu, 24 Sep 2020 20:21:36 +0200 From: Daniel Borkmann To: ast@kernel.org Cc: daniel@iogearbox.net, john.fastabend@gmail.com, netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next 2/6] bpf, net: rework cookie generator as per-cpu one Date: Thu, 24 Sep 2020 20:21:23 +0200 Message-Id: X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.102.4/25937/Thu Sep 24 15:53:11 2020) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org With its use in BPF the cookie generator can be called very frequently in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg) and attached to the root cgroup, for example, when used in v1/v2 mixed environments. In particular when there's a high churn on sockets in the system there can be many parallel requests to the bpf_get_socket_cookie() and bpf_get_netns_cookie() helpers which then cause contention on the shared atomic counter. As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino allocator"), add a small helper library that both can then use for the 64 bit counters. Signed-off-by: Daniel Borkmann --- include/linux/cookie.h | 41 ++++++++++++++++++++++++++++++++++++++++ net/core/net_namespace.c | 5 +++-- net/core/sock_diag.c | 7 ++++--- 3 files changed, 48 insertions(+), 5 deletions(-) create mode 100644 include/linux/cookie.h diff --git a/include/linux/cookie.h b/include/linux/cookie.h new file mode 100644 index 000000000000..2488203dc004 --- /dev/null +++ b/include/linux/cookie.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_COOKIE_H +#define __LINUX_COOKIE_H + +#include +#include + +struct gen_cookie { + u64 __percpu *local_last; + atomic64_t shared_last ____cacheline_aligned_in_smp; +}; + +#define COOKIE_LOCAL_BATCH 4096 + +#define DEFINE_COOKIE(name) \ + static DEFINE_PER_CPU(u64, __##name); \ + static struct gen_cookie name = { \ + .local_last = &__##name, \ + .shared_last = ATOMIC64_INIT(0), \ + } + +static inline u64 gen_cookie_next(struct gen_cookie *gc) +{ + u64 *local_last = &get_cpu_var(*gc->local_last); + u64 val = *local_last; + + if (__is_defined(CONFIG_SMP) && + unlikely((val & (COOKIE_LOCAL_BATCH - 1)) == 0)) { + s64 next = atomic64_add_return(COOKIE_LOCAL_BATCH, + &gc->shared_last); + val = next - COOKIE_LOCAL_BATCH; + } + val++; + if (unlikely(!val)) + val++; + *local_last = val; + put_cpu_var(local_last); + return val; +} + +#endif /* __LINUX_COOKIE_H */ diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index dcd61aca343e..cf29ed8950d1 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -69,7 +70,7 @@ EXPORT_SYMBOL_GPL(pernet_ops_rwsem); static unsigned int max_gen_ptrs = INITIAL_NET_GEN_PTRS; -static atomic64_t cookie_gen; +DEFINE_COOKIE(net_cookie); u64 net_gen_cookie(struct net *net) { @@ -78,7 +79,7 @@ u64 net_gen_cookie(struct net *net) if (res) return res; - res = atomic64_inc_return(&cookie_gen); + res = gen_cookie_next(&net_cookie); atomic64_cmpxchg(&net->net_cookie, 0, res); } } diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c index c13ffbd33d8d..4a4180e8dd35 100644 --- a/net/core/sock_diag.c +++ b/net/core/sock_diag.c @@ -11,7 +11,7 @@ #include #include #include - +#include #include #include @@ -19,7 +19,8 @@ static const struct sock_diag_handler *sock_diag_handlers[AF_MAX]; static int (*inet_rcv_compat)(struct sk_buff *skb, struct nlmsghdr *nlh); static DEFINE_MUTEX(sock_diag_table_mutex); static struct workqueue_struct *broadcast_wq; -static atomic64_t cookie_gen; + +DEFINE_COOKIE(sock_cookie); u64 sock_gen_cookie(struct sock *sk) { @@ -28,7 +29,7 @@ u64 sock_gen_cookie(struct sock *sk) if (res) return res; - res = atomic64_inc_return(&cookie_gen); + res = gen_cookie_next(&sock_cookie); atomic64_cmpxchg(&sk->sk_cookie, 0, res); } } From patchwork Thu Sep 24 18:21:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Borkmann X-Patchwork-Id: 260257 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA1E9C4727F for ; Thu, 24 Sep 2020 18:21:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8C30023741 for ; Thu, 24 Sep 2020 18:21:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728769AbgIXSVo (ORCPT ); Thu, 24 Sep 2020 14:21:44 -0400 Received: from www62.your-server.de ([213.133.104.62]:39356 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728691AbgIXSVm (ORCPT ); Thu, 24 Sep 2020 14:21:42 -0400 Received: from 75.57.196.178.dynamic.wline.res.cust.swisscom.ch ([178.196.57.75] helo=localhost) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from ) id 1kLVsA-00040z-Ec; Thu, 24 Sep 2020 20:21:38 +0200 From: Daniel Borkmann To: ast@kernel.org Cc: daniel@iogearbox.net, john.fastabend@gmail.com, netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next 6/6] bpf, selftests: add redirect_neigh selftest Date: Thu, 24 Sep 2020 20:21:27 +0200 Message-Id: <4bcd4235ff9174a381d598939209b0967155ba35.1600967205.git.daniel@iogearbox.net> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.102.4/25937/Thu Sep 24 15:53:11 2020) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add a small test that excercises the new redirect_neigh() helper for the IPv4 and IPv6 case. Signed-off-by: Daniel Borkmann --- .../selftests/bpf/progs/test_tc_neigh.c | 144 +++++++++++++++ tools/testing/selftests/bpf/test_tc_neigh.sh | 168 ++++++++++++++++++ 2 files changed, 312 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/test_tc_neigh.c create mode 100755 tools/testing/selftests/bpf/test_tc_neigh.sh diff --git a/tools/testing/selftests/bpf/progs/test_tc_neigh.c b/tools/testing/selftests/bpf/progs/test_tc_neigh.c new file mode 100644 index 000000000000..889a72c3024f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_tc_neigh.c @@ -0,0 +1,144 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#ifndef barrier_data +# define barrier_data(ptr) asm volatile("": :"r"(ptr) :"memory") +#endif + +#ifndef ctx_ptr +# define ctx_ptr(field) (void *)(long)(field) +#endif + +#define dst_to_src_tmp 0xeeddddeeU +#define src_to_dst_tmp 0xeeffffeeU + +#define ip4_src 0xac100164 /* 172.16.1.100 */ +#define ip4_dst 0xac100264 /* 172.16.2.100 */ + +#define ip6_src { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, \ + 0x00, 0x01, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe } +#define ip6_dst { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, \ + 0x00, 0x02, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe } + +#ifndef v6_equal +# define v6_equal(a, b) (a.s6_addr32[0] == b.s6_addr32[0] && \ + a.s6_addr32[1] == b.s6_addr32[1] && \ + a.s6_addr32[2] == b.s6_addr32[2] && \ + a.s6_addr32[3] == b.s6_addr32[3]) +#endif + +static __always_inline bool is_remote_ep_v4(struct __sk_buff *skb, + __be32 addr) +{ + void *data_end = ctx_ptr(skb->data_end); + void *data = ctx_ptr(skb->data); + struct iphdr *ip4h; + + if (data + sizeof(struct ethhdr) > data_end) + return false; + + ip4h = (struct iphdr *)(data + sizeof(struct ethhdr)); + if ((void *)(ip4h + 1) > data_end) + return false; + + return ip4h->daddr == addr; +} + +static __always_inline bool is_remote_ep_v6(struct __sk_buff *skb, + struct in6_addr addr) +{ + void *data_end = ctx_ptr(skb->data_end); + void *data = ctx_ptr(skb->data); + struct ipv6hdr *ip6h; + + if (data + sizeof(struct ethhdr) > data_end) + return false; + + ip6h = (struct ipv6hdr *)(data + sizeof(struct ethhdr)); + if ((void *)(ip6h + 1) > data_end) + return false; + + return v6_equal(ip6h->daddr, addr); +} + +SEC("chk_neigh") int tc_chk(struct __sk_buff *skb) +{ + void *data_end = ctx_ptr(skb->data_end); + void *data = ctx_ptr(skb->data); + __u32 *raw = data; + + if (data + sizeof(struct ethhdr) > data_end) + return TC_ACT_SHOT; + + return !raw[0] && !raw[1] && !raw[2] ? TC_ACT_SHOT : TC_ACT_OK; +} + +SEC("dst_ingress") int tc_dst(struct __sk_buff *skb) +{ + int idx = dst_to_src_tmp; + __u8 zero[ETH_ALEN * 2]; + bool redirect = false; + + switch (skb->protocol) { + case __bpf_constant_htons(ETH_P_IP): + redirect = is_remote_ep_v4(skb, __bpf_constant_htonl(ip4_src)); + break; + case __bpf_constant_htons(ETH_P_IPV6): + redirect = is_remote_ep_v6(skb, (struct in6_addr)ip6_src); + break; + } + + if (!redirect) + return TC_ACT_OK; + + barrier_data(&idx); + idx = bpf_ntohl(idx); + + __builtin_memset(&zero, 0, sizeof(zero)); + if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0) + return TC_ACT_SHOT; + + return bpf_redirect_neigh(idx, 0); +} + +SEC("src_ingress") int tc_src(struct __sk_buff *skb) +{ + int idx = src_to_dst_tmp; + __u8 zero[ETH_ALEN * 2]; + bool redirect = false; + + switch (skb->protocol) { + case __bpf_constant_htons(ETH_P_IP): + redirect = is_remote_ep_v4(skb, __bpf_constant_htonl(ip4_dst)); + break; + case __bpf_constant_htons(ETH_P_IPV6): + redirect = is_remote_ep_v6(skb, (struct in6_addr)ip6_dst); + break; + } + + if (!redirect) + return TC_ACT_OK; + + barrier_data(&idx); + idx = bpf_ntohl(idx); + + __builtin_memset(&zero, 0, sizeof(zero)); + if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0) + return TC_ACT_SHOT; + + return bpf_redirect_neigh(idx, 0); +} + +char __license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_tc_neigh.sh b/tools/testing/selftests/bpf/test_tc_neigh.sh new file mode 100755 index 000000000000..31d8c3df8b24 --- /dev/null +++ b/tools/testing/selftests/bpf/test_tc_neigh.sh @@ -0,0 +1,168 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# This test sets up 3 netns (src <-> fwd <-> dst). There is no direct veth link +# between src and dst. The netns fwd has veth links to each src and dst. The +# client is in src and server in dst. The test installs a TC BPF program to each +# host facing veth in fwd which calls into bpf_redirect_peer() to perform the +# neigh addr population and redirect; it also installs a dropper prog on the +# egress side to drop skbs if neigh addrs were not populated. + +if [[ $EUID -ne 0 ]]; then + echo "This script must be run as root" + echo "FAIL" + exit 1 +fi + +# check that nc, dd, ping, ping6 and timeout are present +command -v nc >/dev/null 2>&1 || \ + { echo >&2 "nc is not available"; exit 1; } +command -v dd >/dev/null 2>&1 || \ + { echo >&2 "dd is not available"; exit 1; } +command -v timeout >/dev/null 2>&1 || \ + { echo >&2 "timeout is not available"; exit 1; } +command -v ping >/dev/null 2>&1 || \ + { echo >&2 "ping is not available"; exit 1; } +command -v ping6 >/dev/null 2>&1 || \ + { echo >&2 "ping6 is not available"; exit 1; } + +readonly GREEN='\033[0;92m' +readonly RED='\033[0;31m' +readonly NC='\033[0m' # No Color + +readonly PING_ARG="-c 3 -w 10 -q" + +readonly TIMEOUT=10 + +readonly NS_SRC="ns-src-$(mktemp -u XXXXXX)" +readonly NS_FWD="ns-fwd-$(mktemp -u XXXXXX)" +readonly NS_DST="ns-dst-$(mktemp -u XXXXXX)" + +readonly IP4_SRC="172.16.1.100" +readonly IP4_DST="172.16.2.100" + +readonly IP6_SRC="::1:dead:beef:cafe" +readonly IP6_DST="::2:dead:beef:cafe" + +readonly IP4_SLL="169.254.0.1" +readonly IP4_DLL="169.254.0.2" +readonly IP4_NET="169.254.0.0" + +cleanup() +{ + ip netns del ${NS_SRC} + ip netns del ${NS_FWD} + ip netns del ${NS_DST} +} + +trap cleanup EXIT + +set -e + +ip netns add "${NS_SRC}" +ip netns add "${NS_FWD}" +ip netns add "${NS_DST}" + +ip link add veth_src type veth peer name veth_src_fwd +ip link add veth_dst type veth peer name veth_dst_fwd + +ip link set veth_src netns ${NS_SRC} +ip link set veth_src_fwd netns ${NS_FWD} + +ip link set veth_dst netns ${NS_DST} +ip link set veth_dst_fwd netns ${NS_FWD} + +ip -netns ${NS_SRC} addr add ${IP4_SRC}/32 dev veth_src +ip -netns ${NS_DST} addr add ${IP4_DST}/32 dev veth_dst + +# The fwd netns automatically get a v6 LL address / routes, but also needs v4 +# one in order to start ARP probing. IP4_NET route is added to the endpoints +# so that the ARP processing will reply. + +ip -netns ${NS_FWD} addr add ${IP4_SLL}/32 dev veth_src_fwd +ip -netns ${NS_FWD} addr add ${IP4_DLL}/32 dev veth_dst_fwd + +ip -netns ${NS_SRC} addr add ${IP6_SRC}/128 dev veth_src nodad +ip -netns ${NS_DST} addr add ${IP6_DST}/128 dev veth_dst nodad + +ip -netns ${NS_SRC} link set dev veth_src up +ip -netns ${NS_FWD} link set dev veth_src_fwd up + +ip -netns ${NS_DST} link set dev veth_dst up +ip -netns ${NS_FWD} link set dev veth_dst_fwd up + +ip -netns ${NS_SRC} route add ${IP4_DST}/32 dev veth_src scope global +ip -netns ${NS_SRC} route add ${IP4_NET}/16 dev veth_src scope global +ip -netns ${NS_FWD} route add ${IP4_SRC}/32 dev veth_src_fwd scope global + +ip -netns ${NS_SRC} route add ${IP6_DST}/128 dev veth_src scope global +ip -netns ${NS_FWD} route add ${IP6_SRC}/128 dev veth_src_fwd scope global + +ip -netns ${NS_DST} route add ${IP4_SRC}/32 dev veth_dst scope global +ip -netns ${NS_DST} route add ${IP4_NET}/16 dev veth_dst scope global +ip -netns ${NS_FWD} route add ${IP4_DST}/32 dev veth_dst_fwd scope global + +ip -netns ${NS_DST} route add ${IP6_SRC}/128 dev veth_dst scope global +ip -netns ${NS_FWD} route add ${IP6_DST}/128 dev veth_dst_fwd scope global + +fmac_src=$(ip netns exec ${NS_FWD} cat /sys/class/net/veth_src_fwd/address) +fmac_dst=$(ip netns exec ${NS_FWD} cat /sys/class/net/veth_dst_fwd/address) + +ip -netns ${NS_SRC} neigh add ${IP4_DST} dev veth_src lladdr $fmac_src +ip -netns ${NS_DST} neigh add ${IP4_SRC} dev veth_dst lladdr $fmac_dst + +ip -netns ${NS_SRC} neigh add ${IP6_DST} dev veth_src lladdr $fmac_src +ip -netns ${NS_DST} neigh add ${IP6_SRC} dev veth_dst lladdr $fmac_dst + +veth_dst=$(ip netns exec ${NS_FWD} cat /sys/class/net/veth_dst_fwd/ifindex | awk '{printf "%08x\n", $1}') +veth_src=$(ip netns exec ${NS_FWD} cat /sys/class/net/veth_src_fwd/ifindex | awk '{printf "%08x\n", $1}') + +xxd -p < test_tc_neigh.o | sed "s/eeddddee/$veth_src/g" | xxd -r -p > test_tc_neigh.x.o +xxd -p < test_tc_neigh.x.o | sed "s/eeffffee/$veth_dst/g" | xxd -r -p > test_tc_neigh.y.o + +ip netns exec ${NS_FWD} tc qdisc add dev veth_src_fwd clsact +ip netns exec ${NS_FWD} tc filter add dev veth_src_fwd ingress bpf da obj test_tc_neigh.y.o sec src_ingress +ip netns exec ${NS_FWD} tc filter add dev veth_src_fwd egress bpf da obj test_tc_neigh.y.o sec chk_neigh + +ip netns exec ${NS_FWD} tc qdisc add dev veth_dst_fwd clsact +ip netns exec ${NS_FWD} tc filter add dev veth_dst_fwd ingress bpf da obj test_tc_neigh.y.o sec dst_ingress +ip netns exec ${NS_FWD} tc filter add dev veth_dst_fwd egress bpf da obj test_tc_neigh.y.o sec chk_neigh + +rm -f test_tc_neigh.x.o test_tc_neigh.y.o + +ip netns exec ${NS_DST} bash -c "nc -4 -l -p 9004 &" +ip netns exec ${NS_DST} bash -c "nc -6 -l -p 9006 &" + +set +e + +TEST="TCPv4 connectivity test" +ip netns exec ${NS_SRC} bash -c "timeout ${TIMEOUT} dd if=/dev/zero bs=1000 count=100 > /dev/tcp/${IP4_DST}/9004" +if [ $? -ne 0 ]; then + echo -e "${TEST}: ${RED}FAIL${NC}" + exit 1 +fi +echo -e "${TEST}: ${GREEN}PASS${NC}" + +TEST="TCPv6 connectivity test" +ip netns exec ${NS_SRC} bash -c "timeout ${TIMEOUT} dd if=/dev/zero bs=1000 count=100 > /dev/tcp/${IP6_DST}/9006" +if [ $? -ne 0 ]; then + echo -e "${TEST}: ${RED}FAIL${NC}" + exit 1 +fi +echo -e "${TEST}: ${GREEN}PASS${NC}" + +TEST="ICMPv4 connectivity test" +ip netns exec ${NS_SRC} ping $PING_ARG ${IP4_DST} +if [ $? -ne 0 ]; then + echo -e "${TEST}: ${RED}FAIL${NC}" + exit 1 +fi +echo -e "${TEST}: ${GREEN}PASS${NC}" + +TEST="ICMPv6 connectivity test" +ip netns exec ${NS_SRC} ping6 $PING_ARG ${IP6_DST} +if [ $? -ne 0 ]; then + echo -e "${TEST}: ${RED}FAIL${NC}" + exit 1 +fi +echo -e "${TEST}: ${GREEN}PASS${NC}"