From patchwork Fri Feb 5 06:40:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 377467 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D11EC433E0 for ; Fri, 5 Feb 2021 06:45:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C525364F9B for ; Fri, 5 Feb 2021 06:45:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231220AbhBEGpI (ORCPT ); Fri, 5 Feb 2021 01:45:08 -0500 Received: from mail.kernel.org ([198.145.29.99]:55462 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231197AbhBEGoq (ORCPT ); Fri, 5 Feb 2021 01:44:46 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0BDD164F41; Fri, 5 Feb 2021 06:44:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612507445; bh=d2jh2uYLORwrlC3kS9Wr1UHhbi8o/Uvh+D9DqvUeslg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BL5dfWUPKJzrLs1bEolKRgQsXFCWGmhfDenBBOjuMx5eUcWWLWMNphFZJjg1i75u0 jIirzAQmgts6WwV2sYk8L++LoI9o6JYxkKfXAg8x+gOrWP1SN1B+99POF9truqtCh+ ElrOusuQprewlrlJl0OD3XcfBKB6WMqIj/C5Kdq1laYJMxfFbQOON0AhouGdKilmTc 6iSr965CQqtQbKLZEVNEbkqrNME4ZUrU/kBqpJ3ghQPGaEnF8W5zXtA6ay7Iafa3Qm 0TDkvQGexq2HGlGzC2B4Ik1mmh4fqbWkq6vpjN0mZ1592u2TqajfJ8zCEYkB6CP/H4 s/Pb4mL5zUrbQ== From: Saeed Mahameed To: Jakub Kicinski Cc: netdev@vger.kernel.org, "David S. Miller" , Mark Bloch , Saeed Mahameed Subject: [net-next 01/17] net/mlx5: E-Switch, Refactor setting source port Date: Thu, 4 Feb 2021 22:40:35 -0800 Message-Id: <20210205064051.89592-2-saeed@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210205064051.89592-1-saeed@kernel.org> References: <20210205064051.89592-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Mark Bloch Setting the source port requires only the E-Switch and vport number. Refactor the function to get those parameters instead of passing the full attribute. Signed-off-by: Mark Bloch Reviewed-by: Saeed Mahameed Signed-off-by: Saeed Mahameed --- .../mellanox/mlx5/core/eswitch_offloads.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 7f09f2bbf7c1..416ede2fe5d7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -257,7 +257,8 @@ mlx5_eswitch_set_rule_flow_source(struct mlx5_eswitch *esw, static void mlx5_eswitch_set_rule_source_port(struct mlx5_eswitch *esw, struct mlx5_flow_spec *spec, - struct mlx5_esw_flow_attr *attr) + struct mlx5_eswitch *src_esw, + u16 vport) { void *misc2; void *misc; @@ -268,8 +269,8 @@ mlx5_eswitch_set_rule_source_port(struct mlx5_eswitch *esw, if (mlx5_eswitch_vport_match_metadata_enabled(esw)) { misc2 = MLX5_ADDR_OF(fte_match_param, spec->match_value, misc_parameters_2); MLX5_SET(fte_match_set_misc2, misc2, metadata_reg_c_0, - mlx5_eswitch_get_vport_metadata_for_match(attr->in_mdev->priv.eswitch, - attr->in_rep->vport)); + mlx5_eswitch_get_vport_metadata_for_match(src_esw, + vport)); misc2 = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters_2); MLX5_SET(fte_match_set_misc2, misc2, metadata_reg_c_0, @@ -278,12 +279,12 @@ mlx5_eswitch_set_rule_source_port(struct mlx5_eswitch *esw, spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS_2; } else { misc = MLX5_ADDR_OF(fte_match_param, spec->match_value, misc_parameters); - MLX5_SET(fte_match_set_misc, misc, source_port, attr->in_rep->vport); + MLX5_SET(fte_match_set_misc, misc, source_port, vport); if (MLX5_CAP_ESW(esw->dev, merged_eswitch)) MLX5_SET(fte_match_set_misc, misc, source_eswitch_owner_vhca_id, - MLX5_CAP_GEN(attr->in_mdev, vhca_id)); + MLX5_CAP_GEN(src_esw->dev, vhca_id)); misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters); MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_port); @@ -407,7 +408,9 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, fdb = attr->ft; if (!(attr->flags & MLX5_ESW_ATTR_FLAG_NO_IN_PORT)) - mlx5_eswitch_set_rule_source_port(esw, spec, esw_attr); + mlx5_eswitch_set_rule_source_port(esw, spec, + esw_attr->in_mdev->priv.eswitch, + esw_attr->in_rep->vport); } if (IS_ERR(fdb)) { rule = ERR_CAST(fdb); @@ -487,7 +490,9 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw, dest[i].ft = fwd_fdb; i++; - mlx5_eswitch_set_rule_source_port(esw, spec, esw_attr); + mlx5_eswitch_set_rule_source_port(esw, spec, + esw_attr->in_mdev->priv.eswitch, + esw_attr->in_rep->vport); if (attr->outer_match_level != MLX5_MATCH_NONE) spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS; From patchwork Fri Feb 5 06:40:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 377466 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FEFBC433E6 for ; Fri, 5 Feb 2021 06:45:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0D4EC64FCB for ; Fri, 5 Feb 2021 06:45:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231256AbhBEGpN (ORCPT ); Fri, 5 Feb 2021 01:45:13 -0500 Received: from mail.kernel.org ([198.145.29.99]:55482 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231200AbhBEGor (ORCPT ); Fri, 5 Feb 2021 01:44:47 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id EE93064FB8; Fri, 5 Feb 2021 06:44:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612507447; bh=KFfrXo1bFf+AqKfKQ4nDebK0MQPvta8m9egGvcCfhKk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pF38Bz3pJoaZxPoOsUqooFRRgMY5Bk0cGf4T7UXfYgAm6qxMNy+o3pQyb/ls6N/cz 9MIr9rSLS8gBagNmTnotwvbcEbiEzPTKWHOWaYTTbZKW7mbKAHMAP2Jb315CvRDJc4 +pLFg37qgjYII2ZhmTfZfcZGnDaCJzgktdQhp/H5g2Fh/s8EmZ8MHGuv9E4zvM46Zc H1QOG5Bd54SPtr28OP7V5BOhPONBjYVCX7D92Rq3Q6lhnJ2R6yJrTsn3C1ycaAcI9S aNpySboFMsMT7ItqnXV0/3HGR6FPfpytmbm1d7kNy34mOBzRAYeOKO0KU13jmJezKn xyU8PsvoTjoOA== From: Saeed Mahameed To: Jakub Kicinski Cc: netdev@vger.kernel.org, "David S. Miller" , Vlad Buslov , Dmytro Linkin , Roi Dayan , Saeed Mahameed Subject: [net-next 03/17] net/mlx5e: Always set attr mdev pointer Date: Thu, 4 Feb 2021 22:40:37 -0800 Message-Id: <20210205064051.89592-4-saeed@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210205064051.89592-1-saeed@kernel.org> References: <20210205064051.89592-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Vlad Buslov Eswitch offloads extensions in following patches in the series require attr->esw_attr->in_mdev pointer to always be set. This is already the case for all code paths except mlx5_tc_ct_entry_add_rule() function. Fix the function to assign mdev pointer with priv->mdev value. Signed-off-by: Vlad Buslov Signed-off-by: Dmytro Linkin Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c index 40aaa105b2fc..3fb75dcdc68d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c @@ -711,6 +711,8 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv, attr->outer_match_level = MLX5_MATCH_L4; attr->counter = entry->counter->counter; attr->flags |= MLX5_ESW_ATTR_FLAG_NO_IN_PORT; + if (ct_priv->ns_type == MLX5_FLOW_NAMESPACE_FDB) + attr->esw_attr->in_mdev = priv->mdev; mlx5_tc_ct_set_tuple_match(netdev_priv(ct_priv->netdev), spec, flow_rule); mlx5e_tc_match_to_reg_match(spec, ZONE_TO_REG, entry->tuple.zone, MLX5_CT_ZONE_MASK); From patchwork Fri Feb 5 06:40:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 377465 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA5D0C433DB for ; Fri, 5 Feb 2021 06:45:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 82CEC64F9C for ; Fri, 5 Feb 2021 06:45:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231258AbhBEGpi (ORCPT ); Fri, 5 Feb 2021 01:45:38 -0500 Received: from mail.kernel.org ([198.145.29.99]:55590 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231200AbhBEGp3 (ORCPT ); Fri, 5 Feb 2021 01:45:29 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id E98FF64FC1; Fri, 5 Feb 2021 06:44:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612507449; bh=AMzy+eX/J260Hh/2GMkoYTBomukZe+QX5VCT9F+zTH0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IxSi/xv+RJHVc9bw/i6uI5/jMn4y2l+5ONmvbtPkDfZimMtOy15DKfFYQoCCWktlG 5xB3J10HAK3Sy2eXh/Wo8SuXxwUMyWADbHxPyXf1cufQI2EbSbpRB3lNMPnhHqF0lL QVEYHXgJA9cnqIf//l9YuKznEyl46vP2fSbhXFW/4rKCJtnFAAPr8x/vQswxDR0+j9 2ESzEv+0gwjTo88Wu40E727Lq45Xk0v0d8qxZhj7beiDbTloL/ZeCLYVtQNQMQj+FP pyDccUypXUBmQmQIGrFHFmkfF8Z4Qv0N7U6PHpeoH2sFHBQXkH52fm9cfT4B4/rOtI FdnXXWetFWTCQ== From: Saeed Mahameed To: Jakub Kicinski Cc: netdev@vger.kernel.org, "David S. Miller" , Vlad Buslov , Dmytro Linkin , Roi Dayan , Saeed Mahameed Subject: [net-next 06/17] net/mlx5e: Refactor tun routing helpers Date: Thu, 4 Feb 2021 22:40:40 -0800 Message-Id: <20210205064051.89592-7-saeed@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210205064051.89592-1-saeed@kernel.org> References: <20210205064051.89592-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Vlad Buslov Refactor tun routing helpers to use dedicated struct mlx5e_tc_tun_route_attr instead of multiple output arguments. This simplifies the callers (no need to keep track of bunch of output param pointers) and allows to unify struct release code in new mlx5e_tc_tun_route_attr_cleanup() helper instead of requiring callers to manually release some of the output parameters that require it. Simplify code by unifying error handling at the end of the function and rearranging code. Remove redundant empty line. Signed-off-by: Vlad Buslov Signed-off-by: Dmytro Linkin Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/core/en/tc_tun.c | 235 ++++++++++-------- 1 file changed, 126 insertions(+), 109 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c index 90930e54b6f2..3e18ca200c86 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c @@ -10,6 +10,27 @@ #include "rep/tc.h" #include "rep/neigh.h" +struct mlx5e_tc_tun_route_attr { + struct net_device *out_dev; + struct net_device *route_dev; + union { + struct flowi4 fl4; + struct flowi6 fl6; + } fl; + struct neighbour *n; + u8 ttl; +}; + +#define TC_TUN_ROUTE_ATTR_INIT(name) struct mlx5e_tc_tun_route_attr name = {} + +static void mlx5e_tc_tun_route_attr_cleanup(struct mlx5e_tc_tun_route_attr *attr) +{ + if (attr->n) + neigh_release(attr->n); + if (attr->route_dev) + dev_put(attr->route_dev); +} + struct mlx5e_tc_tunnel *mlx5e_get_tc_tun(struct net_device *tunnel_dev) { if (netif_is_vxlan(tunnel_dev)) @@ -79,12 +100,10 @@ static int get_route_and_out_devs(struct mlx5e_priv *priv, static int mlx5e_route_lookup_ipv4_get(struct mlx5e_priv *priv, struct net_device *mirred_dev, - struct net_device **out_dev, - struct net_device **route_dev, - struct flowi4 *fl4, - struct neighbour **out_n, - u8 *out_ttl) + struct mlx5e_tc_tun_route_attr *attr) { + struct net_device *route_dev; + struct net_device *out_dev; struct neighbour *n; struct rtable *rt; @@ -97,46 +116,50 @@ static int mlx5e_route_lookup_ipv4_get(struct mlx5e_priv *priv, struct mlx5_eswitch *esw = mdev->priv.eswitch; uplink_dev = mlx5_eswitch_uplink_get_proto_dev(esw, REP_ETH); - fl4->flowi4_oif = uplink_dev->ifindex; + attr->fl.fl4.flowi4_oif = uplink_dev->ifindex; } - rt = ip_route_output_key(dev_net(mirred_dev), fl4); + rt = ip_route_output_key(dev_net(mirred_dev), &attr->fl.fl4); if (IS_ERR(rt)) return PTR_ERR(rt); if (mlx5_lag_is_multipath(mdev) && rt->rt_gw_family != AF_INET) { - ip_rt_put(rt); - return -ENETUNREACH; + ret = -ENETUNREACH; + goto err_rt_release; } #else return -EOPNOTSUPP; #endif - ret = get_route_and_out_devs(priv, rt->dst.dev, route_dev, out_dev); - if (ret < 0) { - ip_rt_put(rt); - return ret; - } - dev_hold(*route_dev); + ret = get_route_and_out_devs(priv, rt->dst.dev, &route_dev, &out_dev); + if (ret < 0) + goto err_rt_release; + dev_hold(route_dev); - if (!(*out_ttl)) - *out_ttl = ip4_dst_hoplimit(&rt->dst); - n = dst_neigh_lookup(&rt->dst, &fl4->daddr); - ip_rt_put(rt); + if (!attr->ttl) + attr->ttl = ip4_dst_hoplimit(&rt->dst); + n = dst_neigh_lookup(&rt->dst, &attr->fl.fl4.daddr); if (!n) { - dev_put(*route_dev); - return -ENOMEM; + ret = -ENOMEM; + goto err_dev_release; } - *out_n = n; + ip_rt_put(rt); + attr->route_dev = route_dev; + attr->out_dev = out_dev; + attr->n = n; return 0; + +err_dev_release: + dev_put(route_dev); +err_rt_release: + ip_rt_put(rt); + return ret; } -static void mlx5e_route_lookup_ipv4_put(struct net_device *route_dev, - struct neighbour *n) +static void mlx5e_route_lookup_ipv4_put(struct mlx5e_tc_tun_route_attr *attr) { - neigh_release(n); - dev_put(route_dev); + mlx5e_tc_tun_route_attr_cleanup(attr); } static const char *mlx5e_netdev_kind(struct net_device *dev) @@ -188,28 +211,25 @@ int mlx5e_tc_tun_create_header_ipv4(struct mlx5e_priv *priv, { int max_encap_size = MLX5_CAP_ESW(priv->mdev, max_encap_header_size); const struct ip_tunnel_key *tun_key = &e->tun_info->key; - struct net_device *out_dev, *route_dev; - struct flowi4 fl4 = {}; - struct neighbour *n; + TC_TUN_ROUTE_ATTR_INIT(attr); int ipv4_encap_size; char *encap_header; - u8 nud_state, ttl; struct iphdr *ip; + u8 nud_state; int err; /* add the IP fields */ - fl4.flowi4_tos = tun_key->tos; - fl4.daddr = tun_key->u.ipv4.dst; - fl4.saddr = tun_key->u.ipv4.src; - ttl = tun_key->ttl; + attr.fl.fl4.flowi4_tos = tun_key->tos; + attr.fl.fl4.daddr = tun_key->u.ipv4.dst; + attr.fl.fl4.saddr = tun_key->u.ipv4.src; + attr.ttl = tun_key->ttl; - err = mlx5e_route_lookup_ipv4_get(priv, mirred_dev, &out_dev, &route_dev, - &fl4, &n, &ttl); + err = mlx5e_route_lookup_ipv4_get(priv, mirred_dev, &attr); if (err) return err; ipv4_encap_size = - (is_vlan_dev(route_dev) ? VLAN_ETH_HLEN : ETH_HLEN) + + (is_vlan_dev(attr.route_dev) ? VLAN_ETH_HLEN : ETH_HLEN) + sizeof(struct iphdr) + e->tunnel->calc_hlen(e); @@ -229,37 +249,37 @@ int mlx5e_tc_tun_create_header_ipv4(struct mlx5e_priv *priv, /* used by mlx5e_detach_encap to lookup a neigh hash table * entry in the neigh hash table when a user deletes a rule */ - e->m_neigh.dev = n->dev; - e->m_neigh.family = n->ops->family; - memcpy(&e->m_neigh.dst_ip, n->primary_key, n->tbl->key_len); - e->out_dev = out_dev; - e->route_dev_ifindex = route_dev->ifindex; + e->m_neigh.dev = attr.n->dev; + e->m_neigh.family = attr.n->ops->family; + memcpy(&e->m_neigh.dst_ip, attr.n->primary_key, attr.n->tbl->key_len); + e->out_dev = attr.out_dev; + e->route_dev_ifindex = attr.route_dev->ifindex; /* It's important to add the neigh to the hash table before checking * the neigh validity state. So if we'll get a notification, in case the * neigh changes it's validity state, we would find the relevant neigh * in the hash. */ - err = mlx5e_rep_encap_entry_attach(netdev_priv(out_dev), e); + err = mlx5e_rep_encap_entry_attach(netdev_priv(attr.out_dev), e); if (err) goto free_encap; - read_lock_bh(&n->lock); - nud_state = n->nud_state; - ether_addr_copy(e->h_dest, n->ha); - read_unlock_bh(&n->lock); + read_lock_bh(&attr.n->lock); + nud_state = attr.n->nud_state; + ether_addr_copy(e->h_dest, attr.n->ha); + read_unlock_bh(&attr.n->lock); /* add ethernet header */ - ip = (struct iphdr *)gen_eth_tnl_hdr(encap_header, route_dev, e, + ip = (struct iphdr *)gen_eth_tnl_hdr(encap_header, attr.route_dev, e, ETH_P_IP); /* add ip header */ ip->tos = tun_key->tos; ip->version = 0x4; ip->ihl = 0x5; - ip->ttl = ttl; - ip->daddr = fl4.daddr; - ip->saddr = fl4.saddr; + ip->ttl = attr.ttl; + ip->daddr = attr.fl.fl4.daddr; + ip->saddr = attr.fl.fl4.saddr; /* add tunneling protocol header */ err = mlx5e_gen_ip_tunnel_header((char *)ip + sizeof(struct iphdr), @@ -271,7 +291,7 @@ int mlx5e_tc_tun_create_header_ipv4(struct mlx5e_priv *priv, e->encap_header = encap_header; if (!(nud_state & NUD_VALID)) { - neigh_event_send(n, NULL); + neigh_event_send(attr.n, NULL); /* the encap entry will be made valid on neigh update event * and not used before that. */ @@ -287,8 +307,8 @@ int mlx5e_tc_tun_create_header_ipv4(struct mlx5e_priv *priv, } e->flags |= MLX5_ENCAP_ENTRY_VALID; - mlx5e_rep_queue_neigh_stats_work(netdev_priv(out_dev)); - mlx5e_route_lookup_ipv4_put(route_dev, n); + mlx5e_rep_queue_neigh_stats_work(netdev_priv(attr.out_dev)); + mlx5e_route_lookup_ipv4_put(&attr); return err; destroy_neigh_entry: @@ -296,55 +316,56 @@ int mlx5e_tc_tun_create_header_ipv4(struct mlx5e_priv *priv, free_encap: kfree(encap_header); release_neigh: - mlx5e_route_lookup_ipv4_put(route_dev, n); + mlx5e_route_lookup_ipv4_put(&attr); return err; } #if IS_ENABLED(CONFIG_INET) && IS_ENABLED(CONFIG_IPV6) static int mlx5e_route_lookup_ipv6_get(struct mlx5e_priv *priv, struct net_device *mirred_dev, - struct net_device **out_dev, - struct net_device **route_dev, - struct flowi6 *fl6, - struct neighbour **out_n, - u8 *out_ttl) + struct mlx5e_tc_tun_route_attr *attr) { + struct net_device *route_dev; + struct net_device *out_dev; struct dst_entry *dst; struct neighbour *n; - int ret; - dst = ipv6_stub->ipv6_dst_lookup_flow(dev_net(mirred_dev), NULL, fl6, + dst = ipv6_stub->ipv6_dst_lookup_flow(dev_net(mirred_dev), NULL, &attr->fl.fl6, NULL); if (IS_ERR(dst)) return PTR_ERR(dst); - if (!(*out_ttl)) - *out_ttl = ip6_dst_hoplimit(dst); + if (!attr->ttl) + attr->ttl = ip6_dst_hoplimit(dst); - ret = get_route_and_out_devs(priv, dst->dev, route_dev, out_dev); - if (ret < 0) { - dst_release(dst); - return ret; - } + ret = get_route_and_out_devs(priv, dst->dev, &route_dev, &out_dev); + if (ret < 0) + goto err_dst_release; - dev_hold(*route_dev); - n = dst_neigh_lookup(dst, &fl6->daddr); - dst_release(dst); + dev_hold(route_dev); + n = dst_neigh_lookup(dst, &attr->fl.fl6.daddr); if (!n) { - dev_put(*route_dev); - return -ENOMEM; + ret = -ENOMEM; + goto err_dev_release; } - *out_n = n; + dst_release(dst); + attr->out_dev = out_dev; + attr->route_dev = route_dev; + attr->n = n; return 0; + +err_dev_release: + dev_put(route_dev); +err_dst_release: + dst_release(dst); + return ret; } -static void mlx5e_route_lookup_ipv6_put(struct net_device *route_dev, - struct neighbour *n) +static void mlx5e_route_lookup_ipv6_put(struct mlx5e_tc_tun_route_attr *attr) { - neigh_release(n); - dev_put(route_dev); + mlx5e_tc_tun_route_attr_cleanup(attr); } int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, @@ -353,28 +374,24 @@ int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, { int max_encap_size = MLX5_CAP_ESW(priv->mdev, max_encap_header_size); const struct ip_tunnel_key *tun_key = &e->tun_info->key; - struct net_device *out_dev, *route_dev; - struct flowi6 fl6 = {}; + TC_TUN_ROUTE_ATTR_INIT(attr); struct ipv6hdr *ip6h; - struct neighbour *n = NULL; int ipv6_encap_size; char *encap_header; - u8 nud_state, ttl; + u8 nud_state; int err; - ttl = tun_key->ttl; - - fl6.flowlabel = ip6_make_flowinfo(RT_TOS(tun_key->tos), tun_key->label); - fl6.daddr = tun_key->u.ipv6.dst; - fl6.saddr = tun_key->u.ipv6.src; + attr.ttl = tun_key->ttl; + attr.fl.fl6.flowlabel = ip6_make_flowinfo(RT_TOS(tun_key->tos), tun_key->label); + attr.fl.fl6.daddr = tun_key->u.ipv6.dst; + attr.fl.fl6.saddr = tun_key->u.ipv6.src; - err = mlx5e_route_lookup_ipv6_get(priv, mirred_dev, &out_dev, &route_dev, - &fl6, &n, &ttl); + err = mlx5e_route_lookup_ipv6_get(priv, mirred_dev, &attr); if (err) return err; ipv6_encap_size = - (is_vlan_dev(route_dev) ? VLAN_ETH_HLEN : ETH_HLEN) + + (is_vlan_dev(attr.route_dev) ? VLAN_ETH_HLEN : ETH_HLEN) + sizeof(struct ipv6hdr) + e->tunnel->calc_hlen(e); @@ -394,36 +411,36 @@ int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, /* used by mlx5e_detach_encap to lookup a neigh hash table * entry in the neigh hash table when a user deletes a rule */ - e->m_neigh.dev = n->dev; - e->m_neigh.family = n->ops->family; - memcpy(&e->m_neigh.dst_ip, n->primary_key, n->tbl->key_len); - e->out_dev = out_dev; - e->route_dev_ifindex = route_dev->ifindex; + e->m_neigh.dev = attr.n->dev; + e->m_neigh.family = attr.n->ops->family; + memcpy(&e->m_neigh.dst_ip, attr.n->primary_key, attr.n->tbl->key_len); + e->out_dev = attr.out_dev; + e->route_dev_ifindex = attr.route_dev->ifindex; /* It's importent to add the neigh to the hash table before checking * the neigh validity state. So if we'll get a notification, in case the * neigh changes it's validity state, we would find the relevant neigh * in the hash. */ - err = mlx5e_rep_encap_entry_attach(netdev_priv(out_dev), e); + err = mlx5e_rep_encap_entry_attach(netdev_priv(attr.out_dev), e); if (err) goto free_encap; - read_lock_bh(&n->lock); - nud_state = n->nud_state; - ether_addr_copy(e->h_dest, n->ha); - read_unlock_bh(&n->lock); + read_lock_bh(&attr.n->lock); + nud_state = attr.n->nud_state; + ether_addr_copy(e->h_dest, attr.n->ha); + read_unlock_bh(&attr.n->lock); /* add ethernet header */ - ip6h = (struct ipv6hdr *)gen_eth_tnl_hdr(encap_header, route_dev, e, + ip6h = (struct ipv6hdr *)gen_eth_tnl_hdr(encap_header, attr.route_dev, e, ETH_P_IPV6); /* add ip header */ ip6_flow_hdr(ip6h, tun_key->tos, 0); /* the HW fills up ipv6 payload len */ - ip6h->hop_limit = ttl; - ip6h->daddr = fl6.daddr; - ip6h->saddr = fl6.saddr; + ip6h->hop_limit = attr.ttl; + ip6h->daddr = attr.fl.fl6.daddr; + ip6h->saddr = attr.fl.fl6.saddr; /* add tunneling protocol header */ err = mlx5e_gen_ip_tunnel_header((char *)ip6h + sizeof(struct ipv6hdr), @@ -435,7 +452,7 @@ int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, e->encap_header = encap_header; if (!(nud_state & NUD_VALID)) { - neigh_event_send(n, NULL); + neigh_event_send(attr.n, NULL); /* the encap entry will be made valid on neigh update event * and not used before that. */ @@ -452,8 +469,8 @@ int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, } e->flags |= MLX5_ENCAP_ENTRY_VALID; - mlx5e_rep_queue_neigh_stats_work(netdev_priv(out_dev)); - mlx5e_route_lookup_ipv6_put(route_dev, n); + mlx5e_rep_queue_neigh_stats_work(netdev_priv(attr.out_dev)); + mlx5e_route_lookup_ipv6_put(&attr); return err; destroy_neigh_entry: @@ -461,7 +478,7 @@ int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, free_encap: kfree(encap_header); release_neigh: - mlx5e_route_lookup_ipv6_put(route_dev, n); + mlx5e_route_lookup_ipv6_put(&attr); return err; } #endif From patchwork Fri Feb 5 06:40:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 377461 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9344CC433DB for ; Fri, 5 Feb 2021 06:47:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3C79E64F9D for ; Fri, 5 Feb 2021 06:47:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231388AbhBEGrw (ORCPT ); Fri, 5 Feb 2021 01:47:52 -0500 Received: from mail.kernel.org ([198.145.29.99]:55592 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231259AbhBEGpb (ORCPT ); Fri, 5 Feb 2021 01:45:31 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id EB3CE64FC2; Fri, 5 Feb 2021 06:44:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612507450; bh=IqEQ50c8rULzyD5uMYvpxxzocldZZv1vHHiyxraaD0Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IJOO4VG87UdVO7WOdpyE+qxSL9rpC1ZvdBqSq9lmuZ76++O42a2wwshxKzY69w1hU 2YxQPyQYQ2lQcO1RkLZD5dtetVU5Pqp3LBkiWbwUG6NLDZgol6+16ir4mZmRVxnlww gTJCC0ODVmjoHjz5yvST8O4B3Rp4vUH4KIZz9J/RKhOWwOSel1VsksMqpdO+8L5xu6 0RDtgmNhNvWHoAwEqMmePqmHVAKPQKcv4pSv7pucjsb7d2WBx6I7ACN7jM/8icELjJ dlmbO4p4jM69qcvp5pTi4Qh+3eqnjaaNiY+WV9VN6t6myOK/K+kX7BCyAQ9guN4yu2 joYzbJm0567zw== From: Saeed Mahameed To: Jakub Kicinski Cc: netdev@vger.kernel.org, "David S. Miller" , Vlad Buslov , Dmytro Linkin , Roi Dayan , Saeed Mahameed Subject: [net-next 07/17] net/mlx5: E-Switch, Indirect table infrastructure Date: Thu, 4 Feb 2021 22:40:41 -0800 Message-Id: <20210205064051.89592-8-saeed@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210205064051.89592-1-saeed@kernel.org> References: <20210205064051.89592-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Vlad Buslov Indirect table infrastructure is used to allow fully processing VF tunnel traffic in hardware. Kernel software model uses two TC rules for such traffic: UL rep to tunnel device, then tunnel VF rep to destination VF rep. To implement such pipeline driver needs to program the hardware after matching on UL rule to overwrite source vport from UL to tunnel VF and recirculate the packet to the root table to allow matching on the rule installed on tunnel VF. For this indirect table matches all encapsulated traffic by tunnel parameters and all other IP traffic is sent to tunnel VF by the miss rule. Indirect table API overview: - mlx5_esw_indir_table_{init|destroy}() - init and destroy opaque indirect table object. - mlx5_esw_indir_table_get() - get or create new table according to vport id and IP version. Table has following pre-created groups: recirculation group with match on ethertype and VNI (rules that match encapsulated packets are installed to this group) and forward group with default/miss rule that forwards to vport of tunnel endpoint VF (rule for regular non-encapsulated packets). - mlx5_esw_indir_table_put() - decrease reference to the indirect table and matching rule (for encapsulated traffic). - mlx5_esw_indir_table_needed() - check that in_port is an uplink port and out_port is VF on the same eswitch, verify that the rule is for IP traffic and source port rewrite functionality can be used. - mlx5_esw_indir_table_decap_vport() - function returns decap vport of flow attribute. Co-developed-by: Dmytro Linkin Signed-off-by: Dmytro Linkin Signed-off-by: Vlad Buslov Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/Makefile | 1 + .../net/ethernet/mellanox/mlx5/core/en_tc.h | 14 + .../mellanox/mlx5/core/esw/indir_table.c | 508 ++++++++++++++++++ .../mellanox/mlx5/core/esw/indir_table.h | 76 +++ .../net/ethernet/mellanox/mlx5/core/eswitch.h | 5 + .../mellanox/mlx5/core/eswitch_offloads.c | 12 + 6 files changed, 616 insertions(+) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 8809dd4de57e..f1ccfba60068 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -40,6 +40,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH) += lag_mp.o lib/geneve.o lib/port_tun.o \ en_rep.o en/rep/bond.o en/mod_hdr.o mlx5_core-$(CONFIG_MLX5_CLS_ACT) += en_tc.o en/rep/tc.o en/rep/neigh.o \ en/mapping.o lib/fs_chains.o en/tc_tun.o \ + esw/indir_table.o \ en/tc_tun_vxlan.o en/tc_tun_gre.o en/tc_tun_geneve.o \ en/tc_tun_mplsoudp.o diag/en_tc_tracepoint.o mlx5_core-$(CONFIG_MLX5_TC_CT) += en/tc_ct.o diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h index 56d809904ea7..852e0981343d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h @@ -76,6 +76,7 @@ struct mlx5_flow_attr { struct mlx5_flow_table *dest_ft; u8 inner_match_level; u8 outer_match_level; + u8 ip_version; u32 flags; union { struct mlx5_esw_flow_attr esw_attr[0]; @@ -83,6 +84,19 @@ struct mlx5_flow_attr { }; }; +struct mlx5_rx_tun_attr { + u16 decap_vport; + union { + __be32 v4; + struct in6_addr v6; + } src_ip; /* Valid if decap_vport is not zero */ + union { + __be32 v4; + struct in6_addr v6; + } dst_ip; /* Valid if decap_vport is not zero */ + u32 vni; +}; + #define MLX5E_TC_TABLE_CHAIN_TAG_BITS 16 #define MLX5E_TC_TABLE_CHAIN_TAG_MASK GENMASK(MLX5E_TC_TABLE_CHAIN_TAG_BITS - 1, 0) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c new file mode 100644 index 000000000000..a0ebf40c9907 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c @@ -0,0 +1,508 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* Copyright (c) 2021 Mellanox Technologies. */ + +#include +#include +#include +#include +#include +#include +#include "mlx5_core.h" +#include "eswitch.h" +#include "en.h" +#include "en_tc.h" +#include "fs_core.h" +#include "esw/indir_table.h" +#include "lib/fs_chains.h" + +#define MLX5_ESW_INDIR_TABLE_SIZE 128 +#define MLX5_ESW_INDIR_TABLE_RECIRC_IDX_MAX (MLX5_ESW_INDIR_TABLE_SIZE - 2) +#define MLX5_ESW_INDIR_TABLE_FWD_IDX (MLX5_ESW_INDIR_TABLE_SIZE - 1) + +struct mlx5_esw_indir_table_rule { + struct list_head list; + struct mlx5_flow_handle *handle; + union { + __be32 v4; + struct in6_addr v6; + } dst_ip; + u32 vni; + struct mlx5_modify_hdr *mh; + refcount_t refcnt; +}; + +struct mlx5_esw_indir_table_entry { + struct hlist_node hlist; + struct mlx5_flow_table *ft; + struct mlx5_flow_group *recirc_grp; + struct mlx5_flow_group *fwd_grp; + struct mlx5_flow_handle *fwd_rule; + struct list_head recirc_rules; + int recirc_cnt; + int fwd_ref; + + u16 vport; + u8 ip_version; +}; + +struct mlx5_esw_indir_table { + struct mutex lock; /* protects table */ + DECLARE_HASHTABLE(table, 8); +}; + +struct mlx5_esw_indir_table * +mlx5_esw_indir_table_init(void) +{ + struct mlx5_esw_indir_table *indir = kvzalloc(sizeof(*indir), GFP_KERNEL); + + if (!indir) + return ERR_PTR(-ENOMEM); + + mutex_init(&indir->lock); + hash_init(indir->table); + return indir; +} + +void +mlx5_esw_indir_table_destroy(struct mlx5_esw_indir_table *indir) +{ + mutex_destroy(&indir->lock); + kvfree(indir); +} + +bool +mlx5_esw_indir_table_needed(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + u16 vport_num, + struct mlx5_core_dev *dest_mdev) +{ + struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr; + + /* Use indirect table for all IP traffic from UL to VF with vport + * destination when source rewrite flag is set. + */ + return esw_attr->in_rep->vport == MLX5_VPORT_UPLINK && + mlx5_eswitch_is_vf_vport(esw, vport_num) && + esw->dev == dest_mdev && + attr->ip_version && + attr->flags & MLX5_ESW_ATTR_FLAG_SRC_REWRITE; +} + +u16 +mlx5_esw_indir_table_decap_vport(struct mlx5_flow_attr *attr) +{ + struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr; + + return esw_attr->rx_tun_attr ? esw_attr->rx_tun_attr->decap_vport : 0; +} + +static struct mlx5_esw_indir_table_rule * +mlx5_esw_indir_table_rule_lookup(struct mlx5_esw_indir_table_entry *e, + struct mlx5_esw_flow_attr *attr) +{ + struct mlx5_esw_indir_table_rule *rule; + + list_for_each_entry(rule, &e->recirc_rules, list) + if (rule->vni == attr->rx_tun_attr->vni && + !memcmp(&rule->dst_ip, &attr->rx_tun_attr->dst_ip, + sizeof(attr->rx_tun_attr->dst_ip))) + goto found; + return NULL; + +found: + refcount_inc(&rule->refcnt); + return rule; +} + +static int mlx5_esw_indir_table_rule_get(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + struct mlx5_flow_spec *spec, + struct mlx5_esw_indir_table_entry *e) +{ + struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr; + struct mlx5_fs_chains *chains = esw_chains(esw); + struct mlx5e_tc_mod_hdr_acts mod_acts = {}; + struct mlx5_flow_destination dest = {}; + struct mlx5_esw_indir_table_rule *rule; + struct mlx5_flow_act flow_act = {}; + struct mlx5_flow_spec *rule_spec; + struct mlx5_flow_handle *handle; + int err = 0; + u32 data; + + rule = mlx5_esw_indir_table_rule_lookup(e, esw_attr); + if (rule) + return 0; + + if (e->recirc_cnt == MLX5_ESW_INDIR_TABLE_RECIRC_IDX_MAX) + return -EINVAL; + + rule_spec = kvzalloc(sizeof(*rule_spec), GFP_KERNEL); + if (!rule_spec) + return -ENOMEM; + + rule = kzalloc(sizeof(*rule), GFP_KERNEL); + if (!rule) { + err = -ENOMEM; + goto out; + } + + rule_spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS | + MLX5_MATCH_MISC_PARAMETERS | + MLX5_MATCH_MISC_PARAMETERS_2; + if (MLX5_CAP_FLOWTABLE_NIC_RX(esw->dev, ft_field_support.outer_ip_version)) { + MLX5_SET(fte_match_param, rule_spec->match_criteria, + outer_headers.ip_version, 0xf); + MLX5_SET(fte_match_param, rule_spec->match_value, outer_headers.ip_version, + attr->ip_version); + } else if (attr->ip_version) { + MLX5_SET_TO_ONES(fte_match_param, rule_spec->match_criteria, + outer_headers.ethertype); + MLX5_SET(fte_match_param, rule_spec->match_value, outer_headers.ethertype, + (attr->ip_version == 4 ? ETH_P_IP : ETH_P_IPV6)); + } else { + err = -EOPNOTSUPP; + goto err_mod_hdr; + } + + if (attr->ip_version == 4) { + MLX5_SET_TO_ONES(fte_match_param, rule_spec->match_criteria, + outer_headers.dst_ipv4_dst_ipv6.ipv4_layout.ipv4); + MLX5_SET(fte_match_param, rule_spec->match_value, + outer_headers.dst_ipv4_dst_ipv6.ipv4_layout.ipv4, + ntohl(esw_attr->rx_tun_attr->dst_ip.v4)); + } else if (attr->ip_version == 6) { + int len = sizeof(struct in6_addr); + + memset(MLX5_ADDR_OF(fte_match_param, rule_spec->match_criteria, + outer_headers.dst_ipv4_dst_ipv6.ipv6_layout.ipv6), + 0xff, len); + memcpy(MLX5_ADDR_OF(fte_match_param, rule_spec->match_value, + outer_headers.dst_ipv4_dst_ipv6.ipv6_layout.ipv6), + &esw_attr->rx_tun_attr->dst_ip.v6, len); + } + + MLX5_SET_TO_ONES(fte_match_param, rule_spec->match_criteria, + misc_parameters.vxlan_vni); + MLX5_SET(fte_match_param, rule_spec->match_value, misc_parameters.vxlan_vni, + MLX5_GET(fte_match_param, spec->match_value, misc_parameters.vxlan_vni)); + + MLX5_SET(fte_match_param, rule_spec->match_criteria, + misc_parameters_2.metadata_reg_c_0, mlx5_eswitch_get_vport_metadata_mask()); + MLX5_SET(fte_match_param, rule_spec->match_value, misc_parameters_2.metadata_reg_c_0, + mlx5_eswitch_get_vport_metadata_for_match(esw_attr->in_mdev->priv.eswitch, + MLX5_VPORT_UPLINK)); + + /* Modify flow source to recirculate packet */ + data = mlx5_eswitch_get_vport_metadata_for_set(esw, esw_attr->rx_tun_attr->decap_vport); + err = mlx5e_tc_match_to_reg_set(esw->dev, &mod_acts, MLX5_FLOW_NAMESPACE_FDB, + VPORT_TO_REG, data); + if (err) + goto err_mod_hdr; + + flow_act.modify_hdr = mlx5_modify_header_alloc(esw->dev, MLX5_FLOW_NAMESPACE_FDB, + mod_acts.num_actions, mod_acts.actions); + if (IS_ERR(flow_act.modify_hdr)) { + err = PTR_ERR(flow_act.modify_hdr); + goto err_mod_hdr; + } + + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_MOD_HDR; + flow_act.flags = FLOW_ACT_IGNORE_FLOW_LEVEL | FLOW_ACT_NO_APPEND; + dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; + dest.ft = mlx5_chains_get_table(chains, 0, 1, 0); + if (!dest.ft) { + err = PTR_ERR(dest.ft); + goto err_table; + } + handle = mlx5_add_flow_rules(e->ft, rule_spec, &flow_act, &dest, 1); + if (IS_ERR(handle)) { + err = PTR_ERR(handle); + goto err_handle; + } + + dealloc_mod_hdr_actions(&mod_acts); + rule->handle = handle; + rule->vni = esw_attr->rx_tun_attr->vni; + rule->mh = flow_act.modify_hdr; + memcpy(&rule->dst_ip, &esw_attr->rx_tun_attr->dst_ip, + sizeof(esw_attr->rx_tun_attr->dst_ip)); + refcount_set(&rule->refcnt, 1); + list_add(&rule->list, &e->recirc_rules); + e->recirc_cnt++; + goto out; + +err_handle: + mlx5_chains_put_table(chains, 0, 1, 0); +err_table: + mlx5_modify_header_dealloc(esw->dev, flow_act.modify_hdr); +err_mod_hdr: + kfree(rule); +out: + kfree(rule_spec); + return err; +} + +static void mlx5_esw_indir_table_rule_put(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + struct mlx5_esw_indir_table_entry *e) +{ + struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr; + struct mlx5_fs_chains *chains = esw_chains(esw); + struct mlx5_esw_indir_table_rule *rule; + + list_for_each_entry(rule, &e->recirc_rules, list) + if (rule->vni == esw_attr->rx_tun_attr->vni && + !memcmp(&rule->dst_ip, &esw_attr->rx_tun_attr->dst_ip, + sizeof(esw_attr->rx_tun_attr->dst_ip))) + goto found; + + return; + +found: + if (!refcount_dec_and_test(&rule->refcnt)) + return; + + mlx5_del_flow_rules(rule->handle); + mlx5_chains_put_table(chains, 0, 1, 0); + mlx5_modify_header_dealloc(esw->dev, rule->mh); + list_del(&rule->list); + kfree(rule); + e->recirc_cnt--; +} + +static int mlx5_create_indir_recirc_group(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + struct mlx5_flow_spec *spec, + struct mlx5_esw_indir_table_entry *e) +{ + int err = 0, inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); + u32 *in, *match; + + in = kvzalloc(inlen, GFP_KERNEL); + if (!in) + return -ENOMEM; + + MLX5_SET(create_flow_group_in, in, match_criteria_enable, MLX5_MATCH_OUTER_HEADERS | + MLX5_MATCH_MISC_PARAMETERS | MLX5_MATCH_MISC_PARAMETERS_2); + match = MLX5_ADDR_OF(create_flow_group_in, in, match_criteria); + + if (MLX5_CAP_FLOWTABLE_NIC_RX(esw->dev, ft_field_support.outer_ip_version)) + MLX5_SET(fte_match_param, match, outer_headers.ip_version, 0xf); + else + MLX5_SET_TO_ONES(fte_match_param, match, outer_headers.ethertype); + + if (attr->ip_version == 4) { + MLX5_SET_TO_ONES(fte_match_param, match, + outer_headers.dst_ipv4_dst_ipv6.ipv4_layout.ipv4); + } else if (attr->ip_version == 6) { + memset(MLX5_ADDR_OF(fte_match_param, match, + outer_headers.dst_ipv4_dst_ipv6.ipv6_layout.ipv6), + 0xff, sizeof(struct in6_addr)); + } else { + err = -EOPNOTSUPP; + goto out; + } + + MLX5_SET_TO_ONES(fte_match_param, match, misc_parameters.vxlan_vni); + MLX5_SET(fte_match_param, match, misc_parameters_2.metadata_reg_c_0, + mlx5_eswitch_get_vport_metadata_mask()); + MLX5_SET(create_flow_group_in, in, start_flow_index, 0); + MLX5_SET(create_flow_group_in, in, end_flow_index, MLX5_ESW_INDIR_TABLE_RECIRC_IDX_MAX); + e->recirc_grp = mlx5_create_flow_group(e->ft, in); + if (IS_ERR(e->recirc_grp)) { + err = PTR_ERR(e->recirc_grp); + goto out; + } + + INIT_LIST_HEAD(&e->recirc_rules); + e->recirc_cnt = 0; + +out: + kfree(in); + return err; +} + +static int mlx5_create_indir_fwd_group(struct mlx5_eswitch *esw, + struct mlx5_esw_indir_table_entry *e) +{ + int err = 0, inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); + struct mlx5_flow_destination dest = {}; + struct mlx5_flow_act flow_act = {}; + struct mlx5_flow_spec *spec; + u32 *in; + + in = kvzalloc(inlen, GFP_KERNEL); + if (!in) + return -ENOMEM; + + spec = kvzalloc(sizeof(*spec), GFP_KERNEL); + if (!spec) { + kfree(in); + return -ENOMEM; + } + + /* Hold one entry */ + MLX5_SET(create_flow_group_in, in, start_flow_index, MLX5_ESW_INDIR_TABLE_FWD_IDX); + MLX5_SET(create_flow_group_in, in, end_flow_index, MLX5_ESW_INDIR_TABLE_FWD_IDX); + e->fwd_grp = mlx5_create_flow_group(e->ft, in); + if (IS_ERR(e->fwd_grp)) { + err = PTR_ERR(e->fwd_grp); + goto err_out; + } + + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; + dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT; + dest.vport.num = e->vport; + dest.vport.vhca_id = MLX5_CAP_GEN(esw->dev, vhca_id); + e->fwd_rule = mlx5_add_flow_rules(e->ft, spec, &flow_act, &dest, 1); + if (IS_ERR(e->fwd_rule)) { + mlx5_destroy_flow_group(e->fwd_grp); + err = PTR_ERR(e->fwd_rule); + } + +err_out: + kfree(spec); + kfree(in); + return err; +} + +static struct mlx5_esw_indir_table_entry * +mlx5_esw_indir_table_entry_create(struct mlx5_eswitch *esw, struct mlx5_flow_attr *attr, + struct mlx5_flow_spec *spec, u16 vport, bool decap) +{ + struct mlx5_flow_table_attr ft_attr = {}; + struct mlx5_flow_namespace *root_ns; + struct mlx5_esw_indir_table_entry *e; + struct mlx5_flow_table *ft; + int err = 0; + + root_ns = mlx5_get_flow_namespace(esw->dev, MLX5_FLOW_NAMESPACE_FDB); + if (!root_ns) + return ERR_PTR(-ENOENT); + + e = kzalloc(sizeof(*e), GFP_KERNEL); + if (!e) + return ERR_PTR(-ENOMEM); + + ft_attr.prio = FDB_TC_OFFLOAD; + ft_attr.max_fte = MLX5_ESW_INDIR_TABLE_SIZE; + ft_attr.flags = MLX5_FLOW_TABLE_UNMANAGED; + ft_attr.level = 1; + + ft = mlx5_create_flow_table(root_ns, &ft_attr); + if (IS_ERR(ft)) { + err = PTR_ERR(ft); + goto tbl_err; + } + e->ft = ft; + e->vport = vport; + e->ip_version = attr->ip_version; + e->fwd_ref = !decap; + + err = mlx5_create_indir_recirc_group(esw, attr, spec, e); + if (err) + goto recirc_grp_err; + + if (decap) { + err = mlx5_esw_indir_table_rule_get(esw, attr, spec, e); + if (err) + goto recirc_rule_err; + } + + err = mlx5_create_indir_fwd_group(esw, e); + if (err) + goto fwd_grp_err; + + hash_add(esw->fdb_table.offloads.indir->table, &e->hlist, + vport << 16 | attr->ip_version); + + return e; + +fwd_grp_err: + if (decap) + mlx5_esw_indir_table_rule_put(esw, attr, e); +recirc_rule_err: + mlx5_destroy_flow_group(e->recirc_grp); +recirc_grp_err: + mlx5_destroy_flow_table(e->ft); +tbl_err: + kfree(e); + return ERR_PTR(err); +} + +static struct mlx5_esw_indir_table_entry * +mlx5_esw_indir_table_entry_lookup(struct mlx5_eswitch *esw, u16 vport, u8 ip_version) +{ + struct mlx5_esw_indir_table_entry *e; + u32 key = vport << 16 | ip_version; + + hash_for_each_possible(esw->fdb_table.offloads.indir->table, e, hlist, key) + if (e->vport == vport && e->ip_version == ip_version) + return e; + + return NULL; +} + +struct mlx5_flow_table *mlx5_esw_indir_table_get(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + struct mlx5_flow_spec *spec, + u16 vport, bool decap) +{ + struct mlx5_esw_indir_table_entry *e; + int err; + + mutex_lock(&esw->fdb_table.offloads.indir->lock); + e = mlx5_esw_indir_table_entry_lookup(esw, vport, attr->ip_version); + if (e) { + if (!decap) { + e->fwd_ref++; + } else { + err = mlx5_esw_indir_table_rule_get(esw, attr, spec, e); + if (err) + goto out_err; + } + } else { + e = mlx5_esw_indir_table_entry_create(esw, attr, spec, vport, decap); + if (IS_ERR(e)) { + err = PTR_ERR(e); + esw_warn(esw->dev, "Failed to create indirection table, err %d.\n", err); + goto out_err; + } + } + mutex_unlock(&esw->fdb_table.offloads.indir->lock); + return e->ft; + +out_err: + mutex_unlock(&esw->fdb_table.offloads.indir->lock); + return ERR_PTR(err); +} + +void mlx5_esw_indir_table_put(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + u16 vport, bool decap) +{ + struct mlx5_esw_indir_table_entry *e; + + mutex_lock(&esw->fdb_table.offloads.indir->lock); + e = mlx5_esw_indir_table_entry_lookup(esw, vport, attr->ip_version); + if (!e) + goto out; + + if (!decap) + e->fwd_ref--; + else + mlx5_esw_indir_table_rule_put(esw, attr, e); + + if (e->fwd_ref || e->recirc_cnt) + goto out; + + hash_del(&e->hlist); + mlx5_destroy_flow_group(e->recirc_grp); + mlx5_del_flow_rules(e->fwd_rule); + mlx5_destroy_flow_group(e->fwd_grp); + mlx5_destroy_flow_table(e->ft); + kfree(e); +out: + mutex_unlock(&esw->fdb_table.offloads.indir->lock); +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h b/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h new file mode 100644 index 000000000000..cb9eafd1b4ee --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* Copyright (c) 2021 Mellanox Technologies. */ + +#ifndef __MLX5_ESW_FT_H__ +#define __MLX5_ESW_FT_H__ + +#ifdef CONFIG_MLX5_CLS_ACT + +struct mlx5_esw_indir_table * +mlx5_esw_indir_table_init(void); +void +mlx5_esw_indir_table_destroy(struct mlx5_esw_indir_table *indir); + +struct mlx5_flow_table *mlx5_esw_indir_table_get(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + struct mlx5_flow_spec *spec, + u16 vport, bool decap); +void mlx5_esw_indir_table_put(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + u16 vport, bool decap); + +bool +mlx5_esw_indir_table_needed(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + u16 vport_num, + struct mlx5_core_dev *dest_mdev); + +u16 +mlx5_esw_indir_table_decap_vport(struct mlx5_flow_attr *attr); + +#else +/* indir API stubs */ +struct mlx5_esw_indir_table * +mlx5_esw_indir_table_init(void) +{ + return NULL; +} + +void +mlx5_esw_indir_table_destroy(struct mlx5_esw_indir_table *indir) +{ +} + +static inline struct mlx5_flow_table * +mlx5_esw_indir_table_get(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + struct mlx5_flow_spec *spec, + u16 vport, bool decap) +{ + return ERR_PTR(-EOPNOTSUPP); +} + +static inline void +mlx5_esw_indir_table_put(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + u16 vport, bool decap) +{ +} + +bool +mlx5_esw_indir_table_needed(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + u16 vport_num, + struct mlx5_core_dev *dest_mdev) +{ + return false; +} + +static inline u16 +mlx5_esw_indir_table_decap_vport(struct mlx5_flow_attr *attr) +{ + return 0; +} +#endif + +#endif /* __MLX5_ESW_FT_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 1ab34751329e..c2361c5b824c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -161,6 +161,8 @@ struct mlx5_vport { struct devlink_port *dl_port; }; +struct mlx5_esw_indir_table; + struct mlx5_eswitch_fdb { union { struct legacy_fdb { @@ -191,6 +193,8 @@ struct mlx5_eswitch_fdb { struct mutex lock; } vports; + struct mlx5_esw_indir_table *indir; + } offloads; }; u32 flags; @@ -418,6 +422,7 @@ struct mlx5_esw_flow_attr { struct mlx5_core_dev *mdev; struct mlx5_termtbl_handle *termtbl; } dests[MLX5_MAX_FLOW_FWD_VPORTS]; + struct mlx5_rx_tun_attr *rx_tun_attr; struct mlx5_pkt_reformat *decap_pkt_reformat; }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 1b18f624e04a..da843eab5c07 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -38,6 +38,7 @@ #include #include "mlx5_core.h" #include "eswitch.h" +#include "esw/indir_table.h" #include "esw/acl/ofld.h" #include "rdma.h" #include "en.h" @@ -2342,12 +2343,20 @@ static void esw_destroy_uplink_offloads_acl_tables(struct mlx5_eswitch *esw) static int esw_offloads_steering_init(struct mlx5_eswitch *esw) { + struct mlx5_esw_indir_table *indir; int err; memset(&esw->fdb_table.offloads, 0, sizeof(struct offloads_fdb)); mutex_init(&esw->fdb_table.offloads.vports.lock); hash_init(esw->fdb_table.offloads.vports.table); + indir = mlx5_esw_indir_table_init(); + if (IS_ERR(indir)) { + err = PTR_ERR(indir); + goto create_indir_err; + } + esw->fdb_table.offloads.indir = indir; + err = esw_create_uplink_offloads_acl_tables(esw); if (err) goto create_acl_err; @@ -2379,6 +2388,8 @@ static int esw_offloads_steering_init(struct mlx5_eswitch *esw) create_offloads_err: esw_destroy_uplink_offloads_acl_tables(esw); create_acl_err: + mlx5_esw_indir_table_destroy(esw->fdb_table.offloads.indir); +create_indir_err: mutex_destroy(&esw->fdb_table.offloads.vports.lock); return err; } @@ -2390,6 +2401,7 @@ static void esw_offloads_steering_cleanup(struct mlx5_eswitch *esw) esw_destroy_restore_table(esw); esw_destroy_offloads_table(esw); esw_destroy_uplink_offloads_acl_tables(esw); + mlx5_esw_indir_table_destroy(esw->fdb_table.offloads.indir); mutex_destroy(&esw->fdb_table.offloads.vports.lock); } From patchwork Fri Feb 5 06:40:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 377464 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AF3EC433E0 for ; Fri, 5 Feb 2021 06:46:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1F7FB64F9C for ; Fri, 5 Feb 2021 06:46:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231300AbhBEGpr (ORCPT ); Fri, 5 Feb 2021 01:45:47 -0500 Received: from mail.kernel.org ([198.145.29.99]:55594 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231148AbhBEGp3 (ORCPT ); Fri, 5 Feb 2021 01:45:29 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6256D64FBF; Fri, 5 Feb 2021 06:44:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612507452; bh=r1Llu03Lov+LtJ2ei10+zqTfoCnw/qK/e9YHy/uI3J0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=A0cvOo9ACgdJIF48g7v27r+oEOBKcxQ3D7Pn8wA9GC95YBlP4esunjLcKPuahKs5E JD/fjfm0Wgh9M8TN21yEOkrP5Iu3xOa8Ar/dzSOUge4ImlwsiqoXuirK2N3PAvL2vF YSy4vZlNIgu1aez+rLGWoSr07yNX33APyXvpPC5axxORdWLOFNs4x/9S6vLRbGlG66 jGeCKGiFOro3C2oI+pL5NlB8nYq1+dLBcgFjoPVutcfHxJHD/7VTZ0iGbC8FHSimti FrivFvI3rqmTQ0QhUcvan/eLzWLsu0jvWWgh3lPg4tIiGLMOjMfu6WQyiCTimKrk5V nBWN9AbBefXdA== From: Saeed Mahameed To: Jakub Kicinski Cc: netdev@vger.kernel.org, "David S. Miller" , Vlad Buslov , Dmytro Linkin , Roi Dayan , Saeed Mahameed Subject: [net-next 08/17] net/mlx5e: Remove redundant match on tunnel destination mac Date: Thu, 4 Feb 2021 22:40:42 -0800 Message-Id: <20210205064051.89592-9-saeed@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210205064051.89592-1-saeed@kernel.org> References: <20210205064051.89592-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Vlad Buslov Remove hardcoded match on tunnel destination MAC address. Such match is no longer required and would be wrong for stacked devices topology where encapsulation destination MAC address will be the address of tunnel VF that can change dynamically on route change (implemented in following patches in the series). Signed-off-by: Vlad Buslov Signed-off-by: Dmytro Linkin Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c index 3e18ca200c86..13aa98b82576 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c @@ -642,14 +642,6 @@ int mlx5e_tc_tun_parse(struct net_device *filter_dev, } } - /* Enforce DMAC when offloading incoming tunneled flows. - * Flow counters require a match on the DMAC. - */ - MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, dmac_47_16); - MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, dmac_15_0); - ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v, - dmac_47_16), priv->netdev->dev_addr); - /* let software handle IP fragments */ MLX5_SET(fte_match_set_lyr_2_4, headers_c, frag, 1); MLX5_SET(fte_match_set_lyr_2_4, headers_v, frag, 0); From patchwork Fri Feb 5 06:40:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 377462 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 309FDC433E0 for ; Fri, 5 Feb 2021 06:47:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EDB6964F9D for ; Fri, 5 Feb 2021 06:47:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231345AbhBEGrA (ORCPT ); Fri, 5 Feb 2021 01:47:00 -0500 Received: from mail.kernel.org ([198.145.29.99]:55596 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231260AbhBEGpb (ORCPT ); Fri, 5 Feb 2021 01:45:31 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 39CED64FC3; Fri, 5 Feb 2021 06:44:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612507452; bh=R32dLjTSp4nInLiZj4Sqx2WqromI3mnh7EGtkcv+Bn8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YCp+g93eVt5DSLZYviIxbZroV6v9+QI1I8gobOGnPQYg5jDmlOQUI3LtWSQO+8+OJ XgDC1JO7L/BkRQFYMzwiyG6DPszBMnss2uVO/LVIUHtNOrIjD+hMQrW7yZsVYucnSy gQV2olHzxCfFs7254KtvsjThBdQSJBWbXiAErJqr9osjSaBZ2bzlp2roIlE2W3nTeO /xsOudXfRTL6oulM8aY5peiliR22CRvphT8JhfoirM9zerjwNA/gv/KEdE2nlhrDYl T+q7h/5BK/yc2XKW8bL7fi+3fW8yPw2v9d9HUDiLRqgQ/+MBcEZAzw4sO875yk/4v1 D6JPvb7IVzE5Q== From: Saeed Mahameed To: Jakub Kicinski Cc: netdev@vger.kernel.org, "David S. Miller" , Vlad Buslov , Dmytro Linkin , Roi Dayan , Saeed Mahameed Subject: [net-next 09/17] net/mlx5e: VF tunnel RX traffic offloading Date: Thu, 4 Feb 2021 22:40:43 -0800 Message-Id: <20210205064051.89592-10-saeed@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210205064051.89592-1-saeed@kernel.org> References: <20210205064051.89592-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Vlad Buslov When tunnel endpoint is on VF the encapsulated RX traffic is exposed on the representor of the VF without any further processing of rules installed on the VF. Detect such case by checking if the device returned by route lookup in decap rule handling code is a mlx5 VF and handle it with new redirection tables API. Example TC rules for VF tunnel traffic: 1. Rule that encapsulates the tunneled flow and redirects packets from source VF rep to tunnel device: $ tc -s filter show dev enp8s0f0_1 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 0a:40:bd:30:89:99 src_mac ca:2e:a7:3f:f5:0f eth_type ipv4 ip_tos 0/0x3 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key set src_ip 7.7.7.5 dst_ip 7.7.7.1 key_id 98 dst_port 4789 nocsum ttl 64 pipe index 1 ref 1 bind 1 installed 411 sec used 411 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu used_hw_stats delayed action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen index 1 ref 1 bind 1 installed 411 sec used 0 sec Action statistics: Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 5615833 bytes 4028 pkt backlog 0b 0p requeues 0 cookie bb406d45d343bf7ade9690ae80c7cba4 no_percpu used_hw_stats delayed 2. Rule that redirects from tunnel device to UL rep: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed Co-developed-by: Dmytro Linkin Signed-off-by: Dmytro Linkin Signed-off-by: Vlad Buslov Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/core/en/tc_tun.c | 51 ++++++++ .../ethernet/mellanox/mlx5/core/en/tc_tun.h | 3 + .../net/ethernet/mellanox/mlx5/core/en_tc.c | 102 ++++++++++++++- .../net/ethernet/mellanox/mlx5/core/en_tc.h | 4 + .../mellanox/mlx5/core/eswitch_offloads.c | 119 +++++++++++++++++- 5 files changed, 271 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c index 13aa98b82576..73deafe4e693 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c @@ -483,6 +483,57 @@ int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, } #endif +int mlx5e_tc_tun_route_lookup(struct mlx5e_priv *priv, + struct mlx5_flow_spec *spec, + struct mlx5_flow_attr *flow_attr) +{ + struct mlx5_esw_flow_attr *esw_attr = flow_attr->esw_attr; + TC_TUN_ROUTE_ATTR_INIT(attr); + u16 vport_num; + int err = 0; + + if (flow_attr->ip_version == 4) { + /* Addresses are swapped for decap */ + attr.fl.fl4.saddr = esw_attr->rx_tun_attr->dst_ip.v4; + attr.fl.fl4.daddr = esw_attr->rx_tun_attr->src_ip.v4; + err = mlx5e_route_lookup_ipv4_get(priv, priv->netdev, &attr); + } +#if IS_ENABLED(CONFIG_INET) && IS_ENABLED(CONFIG_IPV6) + else if (flow_attr->ip_version == 6) { + /* Addresses are swapped for decap */ + attr.fl.fl6.saddr = esw_attr->rx_tun_attr->dst_ip.v6; + attr.fl.fl6.daddr = esw_attr->rx_tun_attr->src_ip.v6; + err = mlx5e_route_lookup_ipv6_get(priv, priv->netdev, &attr); + } +#endif + else + return 0; + + if (err) + return err; + + if (attr.route_dev->netdev_ops != &mlx5e_netdev_ops || + !mlx5e_tc_is_vf_tunnel(attr.out_dev, attr.route_dev)) + goto out; + + err = mlx5e_tc_query_route_vport(attr.out_dev, attr.route_dev, &vport_num); + if (err) + goto out; + + esw_attr->rx_tun_attr->vni = MLX5_GET(fte_match_param, spec->match_value, + misc_parameters.vxlan_vni); + esw_attr->rx_tun_attr->decap_vport = vport_num; + +out: + if (flow_attr->ip_version == 4) + mlx5e_route_lookup_ipv4_put(&attr); +#if IS_ENABLED(CONFIG_INET) && IS_ENABLED(CONFIG_IPV6) + else if (flow_attr->ip_version == 6) + mlx5e_route_lookup_ipv6_put(&attr); +#endif + return err; +} + bool mlx5e_tc_tun_device_to_offload(struct mlx5e_priv *priv, struct net_device *netdev) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h index 704359df6095..9d6ee9405eaf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h @@ -70,6 +70,9 @@ mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, struct net_device *mirred_dev, struct mlx5e_encap_entry *e) { return -EOPNOTSUPP; } #endif +int mlx5e_tc_tun_route_lookup(struct mlx5e_priv *priv, + struct mlx5_flow_spec *spec, + struct mlx5_flow_attr *attr); bool mlx5e_tc_tun_device_to_offload(struct mlx5e_priv *priv, struct net_device *netdev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 43f1508a05b5..098f3efa5d4d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1322,7 +1322,7 @@ static void remove_unready_flow(struct mlx5e_tc_flow *flow) static bool same_hw_devs(struct mlx5e_priv *priv, struct mlx5e_priv *peer_priv); -static bool mlx5e_tc_is_vf_tunnel(struct net_device *out_dev, struct net_device *route_dev) +bool mlx5e_tc_is_vf_tunnel(struct net_device *out_dev, struct net_device *route_dev) { struct mlx5_core_dev *out_mdev, *route_mdev; struct mlx5e_priv *out_priv, *route_priv; @@ -1339,8 +1339,7 @@ static bool mlx5e_tc_is_vf_tunnel(struct net_device *out_dev, struct net_device return same_hw_devs(out_priv, route_priv); } -static int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *route_dev, - u16 *vport) +int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *route_dev, u16 *vport) { struct mlx5e_priv *out_priv, *route_priv; struct mlx5_core_dev *route_mdev; @@ -1504,6 +1503,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, kfree(attr->parse_attr->tun_info[out_index]); } kvfree(attr->parse_attr); + kvfree(attr->esw_attr->rx_tun_attr); mlx5_tc_ct_match_del(get_ct_priv(priv), &flow->attr->ct_attr); @@ -2134,6 +2134,67 @@ void mlx5e_tc_set_ethertype(struct mlx5_core_dev *mdev, } } +static u8 mlx5e_tc_get_ip_version(struct mlx5_flow_spec *spec, bool outer) +{ + void *headers_v; + u16 ethertype; + u8 ip_version; + + if (outer) + headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, outer_headers); + else + headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, inner_headers); + + ip_version = MLX5_GET(fte_match_set_lyr_2_4, headers_v, ip_version); + /* Return ip_version converted from ethertype anyway */ + if (!ip_version) { + ethertype = MLX5_GET(fte_match_set_lyr_2_4, headers_v, ethertype); + if (ethertype == ETH_P_IP || ethertype == ETH_P_ARP) + ip_version = 4; + else if (ethertype == ETH_P_IPV6) + ip_version = 6; + } + return ip_version; +} + +static int mlx5e_tc_set_attr_rx_tun(struct mlx5e_tc_flow *flow, + struct mlx5_flow_spec *spec) +{ + struct mlx5_esw_flow_attr *esw_attr = flow->attr->esw_attr; + struct mlx5_rx_tun_attr *tun_attr; + void *daddr, *saddr; + u8 ip_version; + + tun_attr = kvzalloc(sizeof(*tun_attr), GFP_KERNEL); + if (!tun_attr) + return -ENOMEM; + + esw_attr->rx_tun_attr = tun_attr; + ip_version = mlx5e_tc_get_ip_version(spec, true); + + if (ip_version == 4) { + daddr = MLX5_ADDR_OF(fte_match_param, spec->match_value, + outer_headers.dst_ipv4_dst_ipv6.ipv4_layout.ipv4); + saddr = MLX5_ADDR_OF(fte_match_param, spec->match_value, + outer_headers.src_ipv4_src_ipv6.ipv4_layout.ipv4); + tun_attr->dst_ip.v4 = *(__be32 *)daddr; + tun_attr->src_ip.v4 = *(__be32 *)saddr; + } +#if IS_ENABLED(CONFIG_INET) && IS_ENABLED(CONFIG_IPV6) + else if (ip_version == 6) { + int ipv6_size = MLX5_FLD_SZ_BYTES(ipv6_layout, ipv6); + + daddr = MLX5_ADDR_OF(fte_match_param, spec->match_value, + outer_headers.dst_ipv4_dst_ipv6.ipv6_layout.ipv6); + saddr = MLX5_ADDR_OF(fte_match_param, spec->match_value, + outer_headers.src_ipv4_src_ipv6.ipv6_layout.ipv6); + memcpy(&tun_attr->dst_ip.v6, daddr, ipv6_size); + memcpy(&tun_attr->src_ip.v6, saddr, ipv6_size); + } +#endif + return 0; +} + static int parse_tunnel_attr(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow, struct mlx5_flow_spec *spec, @@ -2142,6 +2203,7 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv, u8 *match_level, bool *match_inner) { + struct mlx5e_tc_tunnel *tunnel = mlx5e_get_tc_tun(filter_dev); struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; struct netlink_ext_ack *extack = f->common.extack; bool needs_mapping, sets_mapping; @@ -2179,6 +2241,31 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv, */ if (!netif_is_bareudp(filter_dev)) flow->attr->action |= MLX5_FLOW_CONTEXT_ACTION_DECAP; + err = mlx5e_tc_set_attr_rx_tun(flow, spec); + if (err) + return err; + } else if (tunnel && tunnel->tunnel_type == MLX5E_TC_TUNNEL_TYPE_VXLAN) { + struct mlx5_flow_spec *tmp_spec; + + tmp_spec = kvzalloc(sizeof(*tmp_spec), GFP_KERNEL); + if (!tmp_spec) { + NL_SET_ERR_MSG_MOD(extack, "Failed to allocate memory for vxlan tmp spec"); + netdev_warn(priv->netdev, "Failed to allocate memory for vxlan tmp spec"); + return -ENOMEM; + } + memcpy(tmp_spec, spec, sizeof(*tmp_spec)); + + err = mlx5e_tc_tun_parse(filter_dev, priv, tmp_spec, f, match_level); + if (err) { + kvfree(tmp_spec); + NL_SET_ERR_MSG_MOD(extack, "Failed to parse tunnel attributes"); + netdev_warn(priv->netdev, "Failed to parse tunnel attributes"); + return err; + } + err = mlx5e_tc_set_attr_rx_tun(flow, tmp_spec); + kvfree(tmp_spec); + if (err) + return err; } if (!needs_mapping && !sets_mapping) @@ -4473,6 +4560,15 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, } } + if (decap && esw_attr->rx_tun_attr) { + err = mlx5e_tc_tun_route_lookup(priv, &parse_attr->spec, attr); + if (err) + return err; + } + + /* always set IP version for indirect table handling */ + attr->ip_version = mlx5e_tc_get_ip_version(&parse_attr->spec, true); + if (MLX5_CAP_GEN(esw->dev, prio_tag_required) && action & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP) { /* For prio tag mode, replace vlan pop with rewrite vlan prio diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h index 852e0981343d..ee0029192504 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h @@ -257,6 +257,10 @@ mlx5_tc_rule_delete(struct mlx5e_priv *priv, struct mlx5_flow_handle *rule, struct mlx5_flow_attr *attr); +bool mlx5e_tc_is_vf_tunnel(struct net_device *out_dev, struct net_device *route_dev); +int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *route_dev, + u16 *vport); + #else /* CONFIG_MLX5_CLS_ACT */ static inline int mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; } static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index da843eab5c07..a44728595420 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -40,6 +40,7 @@ #include "eswitch.h" #include "esw/indir_table.h" #include "esw/acl/ofld.h" +#include "esw/indir_table.h" #include "rdma.h" #include "en.h" #include "fs_core.h" @@ -258,6 +259,7 @@ mlx5_eswitch_set_rule_flow_source(struct mlx5_eswitch *esw, static void mlx5_eswitch_set_rule_source_port(struct mlx5_eswitch *esw, struct mlx5_flow_spec *spec, + struct mlx5_flow_attr *attr, struct mlx5_eswitch *src_esw, u16 vport) { @@ -268,6 +270,8 @@ mlx5_eswitch_set_rule_source_port(struct mlx5_eswitch *esw, * VHCA in dual-port RoCE mode, and matching on source vport may fail. */ if (mlx5_eswitch_vport_match_metadata_enabled(esw)) { + if (mlx5_esw_indir_table_decap_vport(attr)) + vport = mlx5_esw_indir_table_decap_vport(attr); misc2 = MLX5_ADDR_OF(fte_match_param, spec->match_value, misc_parameters_2); MLX5_SET(fte_match_set_misc2, misc2, metadata_reg_c_0, mlx5_eswitch_get_vport_metadata_for_match(src_esw, @@ -297,15 +301,46 @@ mlx5_eswitch_set_rule_source_port(struct mlx5_eswitch *esw, } } +static int +esw_setup_decap_indir(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + struct mlx5_flow_spec *spec) +{ + struct mlx5_flow_table *ft; + + if (!(attr->flags & MLX5_ESW_ATTR_FLAG_SRC_REWRITE)) + return -EOPNOTSUPP; + + ft = mlx5_esw_indir_table_get(esw, attr, spec, + mlx5_esw_indir_table_decap_vport(attr), true); + return PTR_ERR_OR_ZERO(ft); +} + static void +esw_cleanup_decap_indir(struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr) +{ + if (mlx5_esw_indir_table_decap_vport(attr)) + mlx5_esw_indir_table_put(esw, attr, + mlx5_esw_indir_table_decap_vport(attr), + true); +} + +static int esw_setup_ft_dest(struct mlx5_flow_destination *dest, struct mlx5_flow_act *flow_act, + struct mlx5_eswitch *esw, struct mlx5_flow_attr *attr, + struct mlx5_flow_spec *spec, int i) { flow_act->flags |= FLOW_ACT_IGNORE_FLOW_LEVEL; dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; dest[i].ft = attr->dest_ft; + + if (mlx5_esw_indir_table_decap_vport(attr)) + return esw_setup_decap_indir(esw, attr, spec); + return 0; } static void @@ -348,6 +383,10 @@ static void esw_put_dest_tables_loop(struct mlx5_eswitch *esw, struct mlx5_flow_ for (i = from; i < to; i++) if (esw_attr->dests[i].flags & MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE) mlx5_chains_put_table(chains, 0, 1, 0); + else if (mlx5_esw_indir_table_needed(esw, attr, esw_attr->dests[i].rep->vport, + esw_attr->dests[i].mdev)) + mlx5_esw_indir_table_put(esw, attr, esw_attr->dests[i].rep->vport, + false); } static bool @@ -397,6 +436,68 @@ static void esw_cleanup_chain_src_port_rewrite(struct mlx5_eswitch *esw, esw_put_dest_tables_loop(esw, attr, esw_attr->split_count, esw_attr->out_count); } +static bool +esw_is_indir_table(struct mlx5_eswitch *esw, struct mlx5_flow_attr *attr) +{ + struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr; + int i; + + for (i = esw_attr->split_count; i < esw_attr->out_count; i++) + if (mlx5_esw_indir_table_needed(esw, attr, esw_attr->dests[i].rep->vport, + esw_attr->dests[i].mdev)) + return true; + return false; +} + +static int +esw_setup_indir_table(struct mlx5_flow_destination *dest, + struct mlx5_flow_act *flow_act, + struct mlx5_eswitch *esw, + struct mlx5_flow_attr *attr, + struct mlx5_flow_spec *spec, + bool ignore_flow_lvl, + int *i) +{ + struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr; + int j, err; + + if (!(attr->flags & MLX5_ESW_ATTR_FLAG_SRC_REWRITE)) + return -EOPNOTSUPP; + + for (j = esw_attr->split_count; j < esw_attr->out_count; j++, (*i)++) { + if (ignore_flow_lvl) + flow_act->flags |= FLOW_ACT_IGNORE_FLOW_LEVEL; + dest[*i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; + + dest[*i].ft = mlx5_esw_indir_table_get(esw, attr, spec, + esw_attr->dests[j].rep->vport, false); + if (IS_ERR(dest[*i].ft)) { + err = PTR_ERR(dest[*i].ft); + goto err_indir_tbl_get; + } + } + + if (mlx5_esw_indir_table_decap_vport(attr)) { + err = esw_setup_decap_indir(esw, attr, spec); + if (err) + goto err_indir_tbl_get; + } + + return 0; + +err_indir_tbl_get: + esw_put_dest_tables_loop(esw, attr, esw_attr->split_count, j); + return err; +} + +static void esw_cleanup_indir_table(struct mlx5_eswitch *esw, struct mlx5_flow_attr *attr) +{ + struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr; + + esw_put_dest_tables_loop(esw, attr, esw_attr->split_count, esw_attr->out_count); + esw_cleanup_decap_indir(esw, attr); +} + static void esw_cleanup_chain_dest(struct mlx5_fs_chains *chains, u32 chain, u32 prio, u32 level) { @@ -454,7 +555,7 @@ esw_setup_dests(struct mlx5_flow_destination *dest, attr->flags |= MLX5_ESW_ATTR_FLAG_SRC_REWRITE; if (attr->dest_ft) { - esw_setup_ft_dest(dest, flow_act, attr, *i); + esw_setup_ft_dest(dest, flow_act, esw, attr, spec, *i); (*i)++; } else if (attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) { esw_setup_slow_path_dest(dest, flow_act, chains, *i); @@ -463,6 +564,8 @@ esw_setup_dests(struct mlx5_flow_destination *dest, err = esw_setup_chain_dest(dest, flow_act, chains, attr->dest_chain, 1, 0, *i); (*i)++; + } else if (esw_is_indir_table(esw, attr)) { + err = esw_setup_indir_table(dest, flow_act, esw, attr, spec, true, i); } else if (esw_is_chain_src_port_rewrite(esw, esw_attr)) { err = esw_setup_chain_src_port_rewrite(dest, flow_act, esw, chains, attr, i); } else { @@ -479,9 +582,13 @@ esw_cleanup_dests(struct mlx5_eswitch *esw, struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr; struct mlx5_fs_chains *chains = esw_chains(esw); - if (!(attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH)) { + if (attr->dest_ft) { + esw_cleanup_decap_indir(esw, attr); + } else if (!(attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH)) { if (attr->dest_chain) esw_cleanup_chain_dest(chains, attr->dest_chain, 1, 0); + else if (esw_is_indir_table(esw, attr)) + esw_cleanup_indir_table(esw, attr); else if (esw_is_chain_src_port_rewrite(esw, esw_attr)) esw_cleanup_chain_src_port_rewrite(esw, attr); } @@ -564,7 +671,7 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, fdb = attr->ft; if (!(attr->flags & MLX5_ESW_ATTR_FLAG_NO_IN_PORT)) - mlx5_eswitch_set_rule_source_port(esw, spec, + mlx5_eswitch_set_rule_source_port(esw, spec, attr, esw_attr->in_mdev->priv.eswitch, esw_attr->in_rep->vport); } @@ -628,7 +735,9 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw, flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; for (i = 0; i < esw_attr->split_count; i++) { - if (esw_is_chain_src_port_rewrite(esw, esw_attr)) + if (esw_is_indir_table(esw, attr)) + err = esw_setup_indir_table(dest, &flow_act, esw, attr, spec, false, &i); + else if (esw_is_chain_src_port_rewrite(esw, esw_attr)) err = esw_setup_chain_src_port_rewrite(dest, &flow_act, esw, chains, attr, &i); else @@ -643,7 +752,7 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw, dest[i].ft = fwd_fdb; i++; - mlx5_eswitch_set_rule_source_port(esw, spec, + mlx5_eswitch_set_rule_source_port(esw, spec, attr, esw_attr->in_mdev->priv.eswitch, esw_attr->in_rep->vport); From patchwork Fri Feb 5 06:40:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 377463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 673FEC433E0 for ; Fri, 5 Feb 2021 06:46:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 195EC64DD6 for ; Fri, 5 Feb 2021 06:46:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231301AbhBEGqR (ORCPT ); Fri, 5 Feb 2021 01:46:17 -0500 Received: from mail.kernel.org ([198.145.29.99]:55600 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231264AbhBEGpc (ORCPT ); Fri, 5 Feb 2021 01:45:32 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6E14364FC6; Fri, 5 Feb 2021 06:44:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612507453; bh=yxzGE5v4TrUM0zsuOJ5QbDIX3zb3pUtIjbP2DUUOZ/k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Hx5wk66bLt4Tcot1a40DIqLxEJtbimJCHA7DjCa/d6LBT+yoJrjaa98EVMWYd3Faa kqGGxlEAYogikwRRQsEOkctym3dwfh80EmdaJlB36P3Iy747Uh+9ICKwsy64mvV5ar iD+FK9hJBR2s/tdMDiGYUCgUvJHkhl2JcGWmUhb/ye87ERxt4M206uKPKWQs/P7yI4 igCgsQqaRcFUyBkOEjSExvzUkE9swgSqDEwKqSzBkWcqVUt7swZMUUW0SsRdq3SEj7 cNgJ/MdfW4gog+KzsjzyOe25Gz3EpH2Kpj8km1yRESrd3uE8ZZqhaXVAeiOUp2NRCo Oj3vlZ1QXjkVQ== From: Saeed Mahameed To: Jakub Kicinski Cc: netdev@vger.kernel.org, "David S. Miller" , Vlad Buslov , Dmytro Linkin , Roi Dayan , Saeed Mahameed Subject: [net-next 11/17] net/mlx5e: Match recirculated packet miss in slow table using reg_c1 Date: Thu, 4 Feb 2021 22:40:45 -0800 Message-Id: <20210205064051.89592-12-saeed@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210205064051.89592-1-saeed@kernel.org> References: <20210205064051.89592-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Vlad Buslov Previous patch in series that implements stack devices RX path implements indirect table rules that match on tunnel VNI. After such rule is created all tunnel traffic is recirculated to root table. However, recirculated packet might not match on any rules installed in the table (for example, when IP traffic follows ARP traffic). In that case packets appear on representor of tunnel endpoint VF instead being redirected to the VF itself. Extend slow table with additional flow group that matches on reg_c0 (source port value set by indirect tables implemented by previous patch in series) and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install one rule per VF vport to match on recirculated miss packets and redirect them to appropriate VF vport. Modify indirect tables code to also rewrite reg_c1 with special 0xFFF mark. Implementation reuses reg_c1 tunnel id bits. This is safe to do because recirculated packets are always matched before decapsulation. Signed-off-by: Vlad Buslov Signed-off-by: Dmytro Linkin Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/en_tc.c | 3 +- .../mellanox/mlx5/core/esw/indir_table.c | 17 ++- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 2 + .../mellanox/mlx5/core/eswitch_offloads.c | 119 +++++++++++++++++- include/linux/mlx5/eswitch.h | 10 +- 5 files changed, 143 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 90db5a99879d..21568d1fc00f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -5515,7 +5515,8 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht) } uplink_priv->tunnel_mapping = mapping; - mapping = mapping_create(sz_enc_opts, ENC_OPTS_BITS_MASK, true); + /* 0xFFF is reserved for stack devices slow path table mark */ + mapping = mapping_create(sz_enc_opts, ENC_OPTS_BITS_MASK - 1, true); if (IS_ERR(mapping)) { err = PTR_ERR(mapping); goto err_enc_opts_mapping; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c index a0ebf40c9907..b7d00c4c7046 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c @@ -162,7 +162,7 @@ static int mlx5_esw_indir_table_rule_get(struct mlx5_eswitch *esw, (attr->ip_version == 4 ? ETH_P_IP : ETH_P_IPV6)); } else { err = -EOPNOTSUPP; - goto err_mod_hdr; + goto err_ethertype; } if (attr->ip_version == 4) { @@ -198,13 +198,18 @@ static int mlx5_esw_indir_table_rule_get(struct mlx5_eswitch *esw, err = mlx5e_tc_match_to_reg_set(esw->dev, &mod_acts, MLX5_FLOW_NAMESPACE_FDB, VPORT_TO_REG, data); if (err) - goto err_mod_hdr; + goto err_mod_hdr_regc0; + + err = mlx5e_tc_match_to_reg_set(esw->dev, &mod_acts, MLX5_FLOW_NAMESPACE_FDB, + TUNNEL_TO_REG, ESW_TUN_SLOW_TABLE_GOTO_VPORT); + if (err) + goto err_mod_hdr_regc1; flow_act.modify_hdr = mlx5_modify_header_alloc(esw->dev, MLX5_FLOW_NAMESPACE_FDB, mod_acts.num_actions, mod_acts.actions); if (IS_ERR(flow_act.modify_hdr)) { err = PTR_ERR(flow_act.modify_hdr); - goto err_mod_hdr; + goto err_mod_hdr_alloc; } flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_MOD_HDR; @@ -236,7 +241,11 @@ static int mlx5_esw_indir_table_rule_get(struct mlx5_eswitch *esw, mlx5_chains_put_table(chains, 0, 1, 0); err_table: mlx5_modify_header_dealloc(esw->dev, flow_act.modify_hdr); -err_mod_hdr: +err_mod_hdr_alloc: +err_mod_hdr_regc1: + dealloc_mod_hdr_actions(&mod_acts); +err_mod_hdr_regc0: +err_ethertype: kfree(rule); out: kfree(rule_spec); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index c2361c5b824c..34a21ff95472 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -179,9 +179,11 @@ struct mlx5_eswitch_fdb { struct mlx5_flow_namespace *ns; struct mlx5_flow_table *slow_fdb; struct mlx5_flow_group *send_to_vport_grp; + struct mlx5_flow_group *send_to_vport_meta_grp; struct mlx5_flow_group *peer_miss_grp; struct mlx5_flow_handle **peer_miss_rules; struct mlx5_flow_group *miss_grp; + struct mlx5_flow_handle **send_to_vport_meta_rules; struct mlx5_flow_handle *miss_rule_uni; struct mlx5_flow_handle *miss_rule_multi; int vlan_push_pop_refcount; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index a44728595420..94cb0217b4f3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -1080,6 +1080,81 @@ void mlx5_eswitch_del_send_to_vport_rule(struct mlx5_flow_handle *rule) mlx5_del_flow_rules(rule); } +static void mlx5_eswitch_del_send_to_vport_meta_rules(struct mlx5_eswitch *esw) +{ + struct mlx5_flow_handle **flows = esw->fdb_table.offloads.send_to_vport_meta_rules; + int i = 0, num_vfs = esw->esw_funcs.num_vfs, vport_num; + + if (!num_vfs || !flows) + return; + + mlx5_esw_for_each_vf_vport_num(esw, vport_num, num_vfs) + mlx5_del_flow_rules(flows[i++]); + + kvfree(flows); +} + +static int +mlx5_eswitch_add_send_to_vport_meta_rules(struct mlx5_eswitch *esw) +{ + int num_vfs, vport_num, rule_idx = 0, err = 0; + struct mlx5_flow_destination dest = {}; + struct mlx5_flow_act flow_act = {0}; + struct mlx5_flow_handle *flow_rule; + struct mlx5_flow_handle **flows; + struct mlx5_flow_spec *spec; + + num_vfs = esw->esw_funcs.num_vfs; + flows = kvzalloc(num_vfs * sizeof(*flows), GFP_KERNEL); + if (!flows) + return -ENOMEM; + + spec = kvzalloc(sizeof(*spec), GFP_KERNEL); + if (!spec) { + err = -ENOMEM; + goto alloc_err; + } + + MLX5_SET(fte_match_param, spec->match_criteria, + misc_parameters_2.metadata_reg_c_0, mlx5_eswitch_get_vport_metadata_mask()); + MLX5_SET(fte_match_param, spec->match_criteria, + misc_parameters_2.metadata_reg_c_1, ESW_TUN_MASK); + MLX5_SET(fte_match_param, spec->match_value, misc_parameters_2.metadata_reg_c_1, + ESW_TUN_SLOW_TABLE_GOTO_VPORT_MARK); + + spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2; + dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT; + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; + + mlx5_esw_for_each_vf_vport_num(esw, vport_num, num_vfs) { + MLX5_SET(fte_match_param, spec->match_value, misc_parameters_2.metadata_reg_c_0, + mlx5_eswitch_get_vport_metadata_for_match(esw, vport_num)); + dest.vport.num = vport_num; + + flow_rule = mlx5_add_flow_rules(esw->fdb_table.offloads.slow_fdb, + spec, &flow_act, &dest, 1); + if (IS_ERR(flow_rule)) { + err = PTR_ERR(flow_rule); + esw_warn(esw->dev, "FDB: Failed to add send to vport meta rule idx %d, err %ld\n", + rule_idx, PTR_ERR(flow_rule)); + goto rule_err; + } + flows[rule_idx++] = flow_rule; + } + + esw->fdb_table.offloads.send_to_vport_meta_rules = flows; + kvfree(spec); + return 0; + +rule_err: + while (--rule_idx >= 0) + mlx5_del_flow_rules(flows[rule_idx]); + kvfree(spec); +alloc_err: + kvfree(flows); + return err; +} + static bool mlx5_eswitch_reg_c1_loopback_supported(struct mlx5_eswitch *esw) { return MLX5_CAP_ESW_FLOWTABLE(esw->dev, fdb_to_vport_reg_c_id) & @@ -1562,11 +1637,11 @@ static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw) { int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); struct mlx5_flow_table_attr ft_attr = {}; + int num_vfs, table_size, ix, err = 0; struct mlx5_core_dev *dev = esw->dev; struct mlx5_flow_namespace *root_ns; struct mlx5_flow_table *fdb = NULL; u32 flags = 0, *flow_group_in; - int table_size, ix, err = 0; struct mlx5_flow_group *g; void *match_criteria; u8 *dmac; @@ -1592,7 +1667,7 @@ static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw) } table_size = esw->total_vports * MAX_SQ_NVPORTS + MAX_PF_SQ + - MLX5_ESW_MISS_FLOWS + esw->total_vports; + MLX5_ESW_MISS_FLOWS + esw->total_vports + esw->esw_funcs.num_vfs; /* create the slow path fdb with encap set, so further table instances * can be created at run time while VFs are probed if the FW allows that. @@ -1640,6 +1715,38 @@ static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw) } esw->fdb_table.offloads.send_to_vport_grp = g; + /* meta send to vport */ + memset(flow_group_in, 0, inlen); + MLX5_SET(create_flow_group_in, flow_group_in, match_criteria_enable, + MLX5_MATCH_MISC_PARAMETERS_2); + + match_criteria = MLX5_ADDR_OF(create_flow_group_in, flow_group_in, match_criteria); + + MLX5_SET(fte_match_param, match_criteria, + misc_parameters_2.metadata_reg_c_0, mlx5_eswitch_get_vport_metadata_mask()); + MLX5_SET(fte_match_param, match_criteria, + misc_parameters_2.metadata_reg_c_1, ESW_TUN_MASK); + + num_vfs = esw->esw_funcs.num_vfs; + if (num_vfs) { + MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, ix); + MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, ix + num_vfs - 1); + ix += num_vfs; + + g = mlx5_create_flow_group(fdb, flow_group_in); + if (IS_ERR(g)) { + err = PTR_ERR(g); + esw_warn(dev, "Failed to create send-to-vport meta flow group err(%d)\n", + err); + goto send_vport_meta_err; + } + esw->fdb_table.offloads.send_to_vport_meta_grp = g; + + err = mlx5_eswitch_add_send_to_vport_meta_rules(esw); + if (err) + goto meta_rule_err; + } + if (MLX5_CAP_ESW(esw->dev, merged_eswitch)) { /* create peer esw miss group */ memset(flow_group_in, 0, inlen); @@ -1707,6 +1814,11 @@ static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw) if (MLX5_CAP_ESW(esw->dev, merged_eswitch)) mlx5_destroy_flow_group(esw->fdb_table.offloads.peer_miss_grp); peer_miss_err: + mlx5_eswitch_del_send_to_vport_meta_rules(esw); +meta_rule_err: + if (esw->fdb_table.offloads.send_to_vport_meta_grp) + mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_meta_grp); +send_vport_meta_err: mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp); send_vport_err: esw_chains_destroy(esw, esw_chains(esw)); @@ -1728,7 +1840,10 @@ static void esw_destroy_offloads_fdb_tables(struct mlx5_eswitch *esw) esw_debug(esw->dev, "Destroy offloads FDB Tables\n"); mlx5_del_flow_rules(esw->fdb_table.offloads.miss_rule_multi); mlx5_del_flow_rules(esw->fdb_table.offloads.miss_rule_uni); + mlx5_eswitch_del_send_to_vport_meta_rules(esw); mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp); + if (esw->fdb_table.offloads.send_to_vport_meta_grp) + mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_meta_grp); if (MLX5_CAP_ESW(esw->dev, merged_eswitch)) mlx5_destroy_flow_group(esw->fdb_table.offloads.peer_miss_grp); mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp); diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h index 3b20e84049c1..994c2c8cb4fd 100644 --- a/include/linux/mlx5/eswitch.h +++ b/include/linux/mlx5/eswitch.h @@ -114,8 +114,16 @@ u32 mlx5_eswitch_get_vport_metadata_for_set(struct mlx5_eswitch *esw, #define ESW_ZONE_ID_BITS 8 #define ESW_TUN_OPTS_BITS 12 #define ESW_TUN_ID_BITS 12 -#define ESW_TUN_OFFSET ESW_ZONE_ID_BITS +#define ESW_TUN_OPTS_OFFSET ESW_ZONE_ID_BITS +#define ESW_TUN_OFFSET ESW_TUN_OPTS_OFFSET #define ESW_ZONE_ID_MASK GENMASK(ESW_ZONE_ID_BITS - 1, 0) +#define ESW_TUN_OPTS_MASK GENMASK(32 - ESW_TUN_ID_BITS - 1, ESW_TUN_OPTS_OFFSET) +#define ESW_TUN_MASK GENMASK(31, ESW_TUN_OFFSET) +#define ESW_TUN_ID_SLOW_TABLE_GOTO_VPORT 0 /* 0 is not a valid tunnel id */ +#define ESW_TUN_OPTS_SLOW_TABLE_GOTO_VPORT 0xFFF /* 0xFFF is a reserved mapping */ +#define ESW_TUN_SLOW_TABLE_GOTO_VPORT ((ESW_TUN_ID_SLOW_TABLE_GOTO_VPORT << ESW_TUN_OPTS_BITS) | \ + ESW_TUN_OPTS_SLOW_TABLE_GOTO_VPORT) +#define ESW_TUN_SLOW_TABLE_GOTO_VPORT_MARK ESW_TUN_OPTS_MASK u8 mlx5_eswitch_mode(struct mlx5_core_dev *dev); #else /* CONFIG_MLX5_ESWITCH */ From patchwork Fri Feb 5 06:40:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 377459 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42845C433E0 for ; Fri, 5 Feb 2021 06:49:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0928B64F40 for ; Fri, 5 Feb 2021 06:49:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231348AbhBEGsv (ORCPT ); Fri, 5 Feb 2021 01:48:51 -0500 Received: from mail.kernel.org ([198.145.29.99]:55824 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231316AbhBEGqL (ORCPT ); Fri, 5 Feb 2021 01:46:11 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 865DE64FCC; Fri, 5 Feb 2021 06:44:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612507456; bh=GGMCabt/sjWFnqVxFkKclinSu0Igen6967to3HYD4DM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EGM9n9dOkGGbaY9lFi/ce0PK/PfnJOrJJmcUIMbV6h6hUUEDu/EeJLk/kFM6GGa3l szSy3ojkX4OF3iIt34kivbiiePmvgjwfNNHSmmb76cUZdEj7ApZLfBRQg+cduliuRj nfmfPFjKWEPeIVeF3idkt9hJU4Y5/HLToOJocGHsFBJqgjmzHe5KHVUE2TIrdjtSG7 WYWuTRKFgr/LpbLZuT3JPzQmEWObyCkVPEOg3v+WJ0cy803kEd4j3SsteqJQtxcWmD RY3UGOY6Wn/Km2Z8YI4iZZGNPu0GV2osAQdQqhiWHjzQEFxkU8zvdS0KMugUHRJZtF 1piWFLLyxvFog== From: Saeed Mahameed To: Jakub Kicinski Cc: netdev@vger.kernel.org, "David S. Miller" , Vlad Buslov , Dmytro Linkin , Roi Dayan , Saeed Mahameed Subject: [net-next 16/17] net/mlx5e: Rename some encap-specific API to generic names Date: Thu, 4 Feb 2021 22:40:50 -0800 Message-Id: <20210205064051.89592-17-saeed@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210205064051.89592-1-saeed@kernel.org> References: <20210205064051.89592-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Vlad Buslov Some of the encap-specific functions and fields will also be used by route update infrastructure in following patches. Rename them to generic names. Signed-off-by: Vlad Buslov Signed-off-by: Dmytro Linkin Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h | 2 +- .../net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 10 +++++----- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c index a7ba1c84371d..065126370acd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c @@ -124,7 +124,7 @@ void mlx5e_rep_update_flows(struct mlx5e_priv *priv, } unlock: mutex_unlock(&esw->offloads.encap_tbl_lock); - mlx5e_put_encap_flow_list(priv, &flow_list); + mlx5e_put_flow_list(priv, &flow_list); } static int diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h index 14db9b5accb1..4f35b486fb44 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h @@ -95,7 +95,7 @@ struct mlx5e_tc_flow { * due to missing route) */ struct net_device *orig_dev; /* netdev adding flow first */ - int tmp_efi_index; + int tmp_entry_index; struct list_head tmp_list; /* temporary flow list used by neigh update */ refcount_t refcnt; struct rcu_head rcu_head; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c index 577216744a17..bc0e26f3fd4c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c @@ -106,8 +106,8 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv, esw_attr = attr->esw_attr; spec = &attr->parse_attr->spec; - esw_attr->dests[flow->tmp_efi_index].pkt_reformat = e->pkt_reformat; - esw_attr->dests[flow->tmp_efi_index].flags |= MLX5_ESW_DEST_ENCAP_VALID; + esw_attr->dests[flow->tmp_entry_index].pkt_reformat = e->pkt_reformat; + esw_attr->dests[flow->tmp_entry_index].flags |= MLX5_ESW_DEST_ENCAP_VALID; /* Flow can be associated with multiple encap entries. * Before offloading the flow verify that all of them have * a valid neighbour. @@ -161,7 +161,7 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv, /* update from encap rule to slow path rule */ rule = mlx5e_tc_offload_to_slow_path(esw, flow, spec); /* mark the flow's encap dest as non-valid */ - esw_attr->dests[flow->tmp_efi_index].flags &= ~MLX5_ESW_DEST_ENCAP_VALID; + esw_attr->dests[flow->tmp_entry_index].flags &= ~MLX5_ESW_DEST_ENCAP_VALID; if (IS_ERR(rule)) { err = PTR_ERR(rule); @@ -195,7 +195,7 @@ void mlx5e_take_all_encap_flows(struct mlx5e_encap_entry *e, struct list_head *f continue; wait_for_completion(&flow->init_done); - flow->tmp_efi_index = efi->index; + flow->tmp_entry_index = efi->index; list_add(&flow->tmp_list, flow_list); } } @@ -294,7 +294,7 @@ void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe) } mutex_unlock(&esw->offloads.encap_tbl_lock); - mlx5e_put_encap_flow_list(priv, &flow_list); + mlx5e_put_flow_list(priv, &flow_list); if (neigh_used) { /* release current encap before breaking the loop */ mlx5e_encap_put(priv, e); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index e6150c7597f0..c5ecb9e4e767 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1451,7 +1451,7 @@ struct mlx5_fc *mlx5e_tc_get_counter(struct mlx5e_tc_flow *flow) } /* Iterate over tmp_list of flows attached to flow_list head. */ -void mlx5e_put_encap_flow_list(struct mlx5e_priv *priv, struct list_head *flow_list) +void mlx5e_put_flow_list(struct mlx5e_priv *priv, struct list_head *flow_list) { struct mlx5e_tc_flow *flow, *tmp; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h index 9042e64a96ca..89003ae7775a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h @@ -174,7 +174,7 @@ bool mlx5e_encap_take(struct mlx5e_encap_entry *e); void mlx5e_encap_put(struct mlx5e_priv *priv, struct mlx5e_encap_entry *e); void mlx5e_take_all_encap_flows(struct mlx5e_encap_entry *e, struct list_head *flow_list); -void mlx5e_put_encap_flow_list(struct mlx5e_priv *priv, struct list_head *flow_list); +void mlx5e_put_flow_list(struct mlx5e_priv *priv, struct list_head *flow_list); struct mlx5e_neigh_hash_entry; void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe); From patchwork Fri Feb 5 06:40:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 377460 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FB5FC433E0 for ; Fri, 5 Feb 2021 06:48:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DCA8364F9B for ; Fri, 5 Feb 2021 06:48:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231421AbhBEGsV (ORCPT ); Fri, 5 Feb 2021 01:48:21 -0500 Received: from mail.kernel.org ([198.145.29.99]:55588 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231293AbhBEGps (ORCPT ); Fri, 5 Feb 2021 01:45:48 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 226BF64FC8; Fri, 5 Feb 2021 06:44:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612507457; bh=Ou05yqXghtb5vWAncKHkHYxOM6SVr3SKMKlsI1r1wfI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CLp+Wni0CRjSzQ8BR9YPLwLpXRDrKlskPmF3LvtbDkCxci1avRa+3NRfGSRClmT4c nzgFD++CRlFzxUB6CMrxnt/ciKoiiPUTcTMuyWS17k5OSt9kZMhbQCPAyqbTtz1QI8 z5wgTqco1eM3t5PyN0/upJ2ah5btCc9iGUF62btrhVVecnYzJe5rgeayl83GfHzI2t zKVK+i8BC5mSwEv1gJZaj9/OmoXTN6fxD2EeNGsvWR923T9SBbEQYbotGaXV92xLO7 EBkvgNPj8T9FoHXzEKb7GtymUpxBUAtOz5K31jSfSfiL70to6wU9EDNXsFVMugHNlW xo+VUwygj1izg== From: Saeed Mahameed To: Jakub Kicinski Cc: netdev@vger.kernel.org, "David S. Miller" , Vlad Buslov , Dmytro Linkin , Roi Dayan , Saeed Mahameed Subject: [net-next 17/17] net/mlx5e: Handle FIB events to update tunnel endpoint device Date: Thu, 4 Feb 2021 22:40:51 -0800 Message-Id: <20210205064051.89592-18-saeed@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210205064051.89592-1-saeed@kernel.org> References: <20210205064051.89592-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Vlad Buslov Process FIB route update events to dynamically update the stack device rules when tunnel routing changes. Use rtnl lock to prevent FIB event handler from running concurrently with neigh update and neigh stats workqueue tasks. Use encap_tbl_lock mutex to synchronize with TC rule update path that doesn't use rtnl lock. FIB event workflow for encap flows: - Unoffload all flows attached to route encaps from slow or fast path depending on encap destination endpoint neigh state. - Update encap IP header according to new route dev. - Update flows mod_hdr action that is responsible for overwriting reg_c0 source port bits to source port of new underlying VF of new route dev. This step requires changing flow create/delete code to save flow parse attribute mod_hdr_acts structure for whole flow lifetime instead of deallocating it after flow creation. Refactor mod_hdr code to allow saving id of individual mod_hdr actions and updating them with dedicated helper. - Offload all flows to either slow or fast path depending on encap destination endpoint neigh state. FIB event workflow for decap flows: - Unoffload all route flows from hardware. When last route flow is deleted all indirect table rules for the route dev will also be deleted. - Update flow attr decap_vport and destination MAC according to underlying VF of new rote dev. - Offload all route flows back to hardware creating new indirect table rules according to updated flow attribute data. Extract some neigh update code to helper functions to be used by both neigh update and route update infrastructure. Signed-off-by: Vlad Buslov Signed-off-by: Dmytro Linkin Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/core/en/tc_priv.h | 1 + .../mellanox/mlx5/core/en/tc_tun_encap.c | 751 +++++++++++++++++- .../mellanox/mlx5/core/en/tc_tun_encap.h | 3 + .../net/ethernet/mellanox/mlx5/core/en_rep.h | 6 + .../net/ethernet/mellanox/mlx5/core/en_tc.c | 76 +- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 1 - .../net/ethernet/mellanox/mlx5/core/eswitch.h | 2 +- 7 files changed, 773 insertions(+), 67 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h index 4f35b486fb44..c223591ffc22 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h @@ -26,6 +26,7 @@ enum { MLX5E_TC_FLOW_FLAG_CT = MLX5E_TC_FLOW_BASE + 7, MLX5E_TC_FLOW_FLAG_L3_TO_L2_DECAP = MLX5E_TC_FLOW_BASE + 8, MLX5E_TC_FLOW_FLAG_TUN_RX = MLX5E_TC_FLOW_BASE + 9, + MLX5E_TC_FLOW_FLAG_FAILED = MLX5E_TC_FLOW_BASE + 10, }; struct mlx5e_tc_flow_parse_attr { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c index bc0e26f3fd4c..6a116335bb21 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c @@ -1,12 +1,17 @@ // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB /* Copyright (c) 2021 Mellanox Technologies. */ +#include #include "tc_tun_encap.h" #include "en_tc.h" #include "tc_tun.h" #include "rep/tc.h" #include "diag/en_tc_tracepoint.h" +enum { + MLX5E_ROUTE_ENTRY_VALID = BIT(0), +}; + struct mlx5e_route_key { int ip_version; union { @@ -19,11 +24,26 @@ struct mlx5e_route_entry { struct mlx5e_route_key key; struct list_head encap_entries; struct list_head decap_flows; + u32 flags; struct hlist_node hlist; refcount_t refcnt; + int tunnel_dev_index; struct rcu_head rcu; }; +struct mlx5e_tc_tun_encap { + struct mlx5e_priv *priv; + struct notifier_block fib_nb; + spinlock_t route_lock; /* protects route_tbl */ + unsigned long route_tbl_last_update; + DECLARE_HASHTABLE(route_tbl, 8); +}; + +static bool mlx5e_route_entry_valid(struct mlx5e_route_entry *r) +{ + return r->flags & MLX5E_ROUTE_ENTRY_VALID; +} + int mlx5e_tc_set_attr_rx_tun(struct mlx5e_tc_flow *flow, struct mlx5_flow_spec *spec) { @@ -72,6 +92,27 @@ int mlx5e_tc_set_attr_rx_tun(struct mlx5e_tc_flow *flow, return 0; } +static bool mlx5e_tc_flow_all_encaps_valid(struct mlx5_esw_flow_attr *esw_attr) +{ + bool all_flow_encaps_valid = true; + int i; + + /* Flow can be associated with multiple encap entries. + * Before offloading the flow verify that all of them have + * a valid neighbour. + */ + for (i = 0; i < MLX5_MAX_FLOW_FWD_VPORTS; i++) { + if (!(esw_attr->dests[i].flags & MLX5_ESW_DEST_ENCAP)) + continue; + if (!(esw_attr->dests[i].flags & MLX5_ESW_DEST_ENCAP_VALID)) { + all_flow_encaps_valid = false; + break; + } + } + + return all_flow_encaps_valid; +} + void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv, struct mlx5e_encap_entry *e, struct list_head *flow_list) @@ -84,6 +125,9 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow; int err; + if (e->flags & MLX5_ENCAP_ENTRY_NO_ROUTE) + return; + e->pkt_reformat = mlx5_packet_reformat_alloc(priv->mdev, e->reformat_type, e->encap_size, e->encap_header, @@ -97,9 +141,6 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv, mlx5e_rep_queue_neigh_stats_work(priv); list_for_each_entry(flow, flow_list, tmp_list) { - bool all_flow_encaps_valid = true; - int i; - if (!mlx5e_is_offloaded_flow(flow)) continue; attr = flow->attr; @@ -108,20 +149,9 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv, esw_attr->dests[flow->tmp_entry_index].pkt_reformat = e->pkt_reformat; esw_attr->dests[flow->tmp_entry_index].flags |= MLX5_ESW_DEST_ENCAP_VALID; - /* Flow can be associated with multiple encap entries. - * Before offloading the flow verify that all of them have - * a valid neighbour. - */ - for (i = 0; i < MLX5_MAX_FLOW_FWD_VPORTS; i++) { - if (!(esw_attr->dests[i].flags & MLX5_ESW_DEST_ENCAP)) - continue; - if (!(esw_attr->dests[i].flags & MLX5_ESW_DEST_ENCAP_VALID)) { - all_flow_encaps_valid = false; - break; - } - } + /* Do not offload flows with unresolved neighbors */ - if (!all_flow_encaps_valid) + if (!mlx5e_tc_flow_all_encaps_valid(esw_attr)) continue; /* update from slow path rule to encap rule */ rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, attr); @@ -181,6 +211,18 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv, mlx5_packet_reformat_dealloc(priv->mdev, e->pkt_reformat); } +static void mlx5e_take_tmp_flow(struct mlx5e_tc_flow *flow, + struct list_head *flow_list, + int index) +{ + if (IS_ERR(mlx5e_flow_get(flow))) + return; + wait_for_completion(&flow->init_done); + + flow->tmp_entry_index = index; + list_add(&flow->tmp_list, flow_list); +} + /* Takes reference to all flows attached to encap and adds the flows to * flow_list using 'tmp_list' list_head in mlx5e_tc_flow. */ @@ -191,15 +233,22 @@ void mlx5e_take_all_encap_flows(struct mlx5e_encap_entry *e, struct list_head *f list_for_each_entry(efi, &e->flows, list) { flow = container_of(efi, struct mlx5e_tc_flow, encaps[efi->index]); - if (IS_ERR(mlx5e_flow_get(flow))) - continue; - wait_for_completion(&flow->init_done); - - flow->tmp_entry_index = efi->index; - list_add(&flow->tmp_list, flow_list); + mlx5e_take_tmp_flow(flow, flow_list, efi->index); } } +/* Takes reference to all flows attached to route and adds the flows to + * flow_list using 'tmp_list' list_head in mlx5e_tc_flow. + */ +static void mlx5e_take_all_route_decap_flows(struct mlx5e_route_entry *r, + struct list_head *flow_list) +{ + struct mlx5e_tc_flow *flow; + + list_for_each_entry(flow, &r->decap_flows, decap_routes) + mlx5e_take_tmp_flow(flow, flow_list, 0); +} + static struct mlx5e_encap_entry * mlx5e_get_next_valid_encap(struct mlx5e_neigh_hash_entry *nhe, struct mlx5e_encap_entry *e) @@ -557,21 +606,78 @@ static int mlx5e_set_vf_tunnel(struct mlx5_eswitch *esw, esw_attr->dests[out_index].flags |= MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE; data = mlx5_eswitch_get_vport_metadata_for_set(esw_attr->in_mdev->priv.eswitch, vport_num); - err = mlx5e_tc_match_to_reg_set(esw->dev, mod_hdr_acts, - MLX5_FLOW_NAMESPACE_FDB, VPORT_TO_REG, data); + err = mlx5e_tc_match_to_reg_set_and_get_id(esw->dev, mod_hdr_acts, + MLX5_FLOW_NAMESPACE_FDB, + VPORT_TO_REG, data); + if (err >= 0) { + esw_attr->dests[out_index].src_port_rewrite_act_id = err; + err = 0; + } + +out: + if (route_dev) + dev_put(route_dev); + return err; +} + +static int mlx5e_update_vf_tunnel(struct mlx5_eswitch *esw, + struct mlx5_esw_flow_attr *attr, + struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts, + struct net_device *out_dev, + int route_dev_ifindex, + int out_index) +{ + int act_id = attr->dests[out_index].src_port_rewrite_act_id; + struct net_device *route_dev; + u16 vport_num; + int err = 0; + u32 data; + + route_dev = dev_get_by_index(dev_net(out_dev), route_dev_ifindex); + + if (!route_dev || route_dev->netdev_ops != &mlx5e_netdev_ops || + !mlx5e_tc_is_vf_tunnel(out_dev, route_dev)) { + err = -ENODEV; + goto out; + } + + err = mlx5e_tc_query_route_vport(out_dev, route_dev, &vport_num); if (err) goto out; + data = mlx5_eswitch_get_vport_metadata_for_set(attr->in_mdev->priv.eswitch, + vport_num); + mlx5e_tc_match_to_reg_mod_hdr_change(esw->dev, mod_hdr_acts, VPORT_TO_REG, act_id, data); + out: if (route_dev) dev_put(route_dev); return err; } +static unsigned int mlx5e_route_tbl_get_last_update(struct mlx5e_priv *priv) +{ + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5_rep_uplink_priv *uplink_priv; + struct mlx5e_rep_priv *uplink_rpriv; + struct mlx5e_tc_tun_encap *encap; + unsigned int ret; + + uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH); + uplink_priv = &uplink_rpriv->uplink_priv; + encap = uplink_priv->encap; + + spin_lock_bh(&encap->route_lock); + ret = encap->route_tbl_last_update; + spin_unlock_bh(&encap->route_lock); + return ret; +} + static int mlx5e_attach_encap_route(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow, struct mlx5e_encap_entry *e, bool new_encap_entry, + unsigned long tbl_time_before, int out_index); int mlx5e_attach_encap(struct mlx5e_priv *priv, @@ -586,6 +692,7 @@ int mlx5e_attach_encap(struct mlx5e_priv *priv, struct mlx5e_tc_flow_parse_attr *parse_attr; struct mlx5_flow_attr *attr = flow->attr; const struct ip_tunnel_info *tun_info; + unsigned long tbl_time_before = 0; struct encap_key key; struct mlx5e_encap_entry *e; bool entry_created = false; @@ -651,6 +758,7 @@ int mlx5e_attach_encap(struct mlx5e_priv *priv, INIT_LIST_HEAD(&e->flows); hash_add_rcu(esw->offloads.encap_tbl, &e->encap_hlist, hash_key); + tbl_time_before = mlx5e_route_tbl_get_last_update(priv); mutex_unlock(&esw->offloads.encap_tbl_lock); if (family == AF_INET) @@ -668,7 +776,8 @@ int mlx5e_attach_encap(struct mlx5e_priv *priv, e->compl_result = 1; attach_flow: - err = mlx5e_attach_encap_route(priv, flow, e, entry_created, out_index); + err = mlx5e_attach_encap_route(priv, flow, e, entry_created, tbl_time_before, + out_index); if (err) goto out_err; @@ -797,15 +906,48 @@ static u32 hash_route_info(struct mlx5e_route_key *key) return jhash(&key->endpoint_ip.v6, sizeof(key->endpoint_ip.v6), 0); } +static void mlx5e_route_dealloc(struct mlx5e_priv *priv, + struct mlx5e_route_entry *r) +{ + WARN_ON(!list_empty(&r->decap_flows)); + WARN_ON(!list_empty(&r->encap_entries)); + + kfree_rcu(r, rcu); +} + +static void mlx5e_route_put(struct mlx5e_priv *priv, struct mlx5e_route_entry *r) +{ + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + + if (!refcount_dec_and_mutex_lock(&r->refcnt, &esw->offloads.encap_tbl_lock)) + return; + + hash_del_rcu(&r->hlist); + mutex_unlock(&esw->offloads.encap_tbl_lock); + + mlx5e_route_dealloc(priv, r); +} + +static void mlx5e_route_put_locked(struct mlx5e_priv *priv, struct mlx5e_route_entry *r) +{ + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + + lockdep_assert_held(&esw->offloads.encap_tbl_lock); + + if (!refcount_dec_and_test(&r->refcnt)) + return; + hash_del_rcu(&r->hlist); + mlx5e_route_dealloc(priv, r); +} + static struct mlx5e_route_entry * -mlx5e_route_get(struct mlx5e_priv *priv, struct mlx5e_route_key *key, +mlx5e_route_get(struct mlx5e_tc_tun_encap *encap, struct mlx5e_route_key *key, u32 hash_key) { - struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; struct mlx5e_route_key r_key; struct mlx5e_route_entry *r; - hash_for_each_possible(esw->offloads.route_tbl, r, hlist, hash_key) { + hash_for_each_possible(encap->route_tbl, r, hlist, hash_key) { r_key = r->key; if (!cmp_route_info(&r_key, key) && refcount_inc_not_zero(&r->refcnt)) @@ -816,33 +958,120 @@ mlx5e_route_get(struct mlx5e_priv *priv, struct mlx5e_route_key *key, static struct mlx5e_route_entry * mlx5e_route_get_create(struct mlx5e_priv *priv, - struct mlx5e_route_key *key) + struct mlx5e_route_key *key, + int tunnel_dev_index, + unsigned long *route_tbl_change_time) { struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5_rep_uplink_priv *uplink_priv; + struct mlx5e_rep_priv *uplink_rpriv; + struct mlx5e_tc_tun_encap *encap; struct mlx5e_route_entry *r; u32 hash_key; + uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH); + uplink_priv = &uplink_rpriv->uplink_priv; + encap = uplink_priv->encap; + hash_key = hash_route_info(key); - r = mlx5e_route_get(priv, key, hash_key); - if (r) + spin_lock_bh(&encap->route_lock); + r = mlx5e_route_get(encap, key, hash_key); + spin_unlock_bh(&encap->route_lock); + if (r) { + if (!mlx5e_route_entry_valid(r)) { + mlx5e_route_put_locked(priv, r); + return ERR_PTR(-EINVAL); + } return r; + } r = kzalloc(sizeof(*r), GFP_KERNEL); if (!r) return ERR_PTR(-ENOMEM); r->key = *key; + r->flags |= MLX5E_ROUTE_ENTRY_VALID; + r->tunnel_dev_index = tunnel_dev_index; refcount_set(&r->refcnt, 1); INIT_LIST_HEAD(&r->decap_flows); INIT_LIST_HEAD(&r->encap_entries); - hash_add(esw->offloads.route_tbl, &r->hlist, hash_key); + + spin_lock_bh(&encap->route_lock); + *route_tbl_change_time = encap->route_tbl_last_update; + hash_add(encap->route_tbl, &r->hlist, hash_key); + spin_unlock_bh(&encap->route_lock); + return r; } +static struct mlx5e_route_entry * +mlx5e_route_lookup_for_update(struct mlx5e_tc_tun_encap *encap, struct mlx5e_route_key *key) +{ + u32 hash_key = hash_route_info(key); + struct mlx5e_route_entry *r; + + spin_lock_bh(&encap->route_lock); + encap->route_tbl_last_update = jiffies; + r = mlx5e_route_get(encap, key, hash_key); + spin_unlock_bh(&encap->route_lock); + + return r; +} + +struct mlx5e_tc_fib_event_data { + struct work_struct work; + unsigned long event; + struct mlx5e_route_entry *r; + struct net_device *ul_dev; +}; + +static void mlx5e_tc_fib_event_work(struct work_struct *work); +static struct mlx5e_tc_fib_event_data * +mlx5e_tc_init_fib_work(unsigned long event, struct net_device *ul_dev, gfp_t flags) +{ + struct mlx5e_tc_fib_event_data *fib_work; + + fib_work = kzalloc(sizeof(*fib_work), flags); + if (WARN_ON(!fib_work)) + return NULL; + + INIT_WORK(&fib_work->work, mlx5e_tc_fib_event_work); + fib_work->event = event; + fib_work->ul_dev = ul_dev; + + return fib_work; +} + +static int +mlx5e_route_enqueue_update(struct mlx5e_priv *priv, + struct mlx5e_route_entry *r, + unsigned long event) +{ + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5e_tc_fib_event_data *fib_work; + struct mlx5e_rep_priv *uplink_rpriv; + struct net_device *ul_dev; + + uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH); + ul_dev = uplink_rpriv->netdev; + + fib_work = mlx5e_tc_init_fib_work(event, ul_dev, GFP_KERNEL); + if (!fib_work) + return -ENOMEM; + + dev_hold(ul_dev); + refcount_inc(&r->refcnt); + fib_work->r = r; + queue_work(priv->wq, &fib_work->work); + + return 0; +} + int mlx5e_attach_decap_route(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow) { struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + unsigned long tbl_time_before, tbl_time_after; struct mlx5e_tc_flow_parse_attr *parse_attr; struct mlx5_flow_attr *attr = flow->attr; struct mlx5_esw_flow_attr *esw_attr; @@ -856,6 +1085,8 @@ int mlx5e_attach_decap_route(struct mlx5e_priv *priv, if (!esw_attr->rx_tun_attr) goto out; + tbl_time_before = mlx5e_route_tbl_get_last_update(priv); + tbl_time_after = tbl_time_before; err = mlx5e_tc_tun_route_lookup(priv, &parse_attr->spec, attr); if (err || !esw_attr->rx_tun_attr->decap_vport) goto out; @@ -866,11 +1097,22 @@ int mlx5e_attach_decap_route(struct mlx5e_priv *priv, else key.endpoint_ip.v6 = esw_attr->rx_tun_attr->dst_ip.v6; - r = mlx5e_route_get_create(priv, &key); + r = mlx5e_route_get_create(priv, &key, parse_attr->filter_dev->ifindex, + &tbl_time_after); if (IS_ERR(r)) { err = PTR_ERR(r); goto out; } + /* Routing changed concurrently. FIB event handler might have missed new + * entry, schedule update. + */ + if (tbl_time_before != tbl_time_after) { + err = mlx5e_route_enqueue_update(priv, r, FIB_EVENT_ENTRY_REPLACE); + if (err) { + mlx5e_route_put_locked(priv, r); + goto out; + } + } flow->decap_route = r; list_add(&flow->decap_routes, &r->decap_flows); @@ -886,9 +1128,11 @@ static int mlx5e_attach_encap_route(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow, struct mlx5e_encap_entry *e, bool new_encap_entry, + unsigned long tbl_time_before, int out_index) { struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + unsigned long tbl_time_after = tbl_time_before; struct mlx5e_tc_flow_parse_attr *parse_attr; struct mlx5_flow_attr *attr = flow->attr; const struct ip_tunnel_info *tun_info; @@ -917,9 +1161,20 @@ static int mlx5e_attach_encap_route(struct mlx5e_priv *priv, MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE)) return err; - r = mlx5e_route_get_create(priv, &key); + r = mlx5e_route_get_create(priv, &key, parse_attr->mirred_ifindex[out_index], + &tbl_time_after); if (IS_ERR(r)) return PTR_ERR(r); + /* Routing changed concurrently. FIB event handler might have missed new + * entry, schedule update. + */ + if (tbl_time_before != tbl_time_after) { + err = mlx5e_route_enqueue_update(priv, r, FIB_EVENT_ENTRY_REPLACE); + if (err) { + mlx5e_route_put_locked(priv, r); + return err; + } + } flow->encap_routes[out_index].r = r; if (new_encap_entry) @@ -928,15 +1183,6 @@ static int mlx5e_attach_encap_route(struct mlx5e_priv *priv, return 0; } -static void mlx5e_route_dealloc(struct mlx5e_priv *priv, - struct mlx5e_route_entry *r) -{ - WARN_ON(!list_empty(&r->decap_flows)); - WARN_ON(!list_empty(&r->encap_entries)); - - kfree_rcu(r, rcu); -} - void mlx5e_detach_decap_route(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow) { @@ -986,3 +1232,422 @@ static void mlx5e_detach_encap_route(struct mlx5e_priv *priv, mlx5e_route_dealloc(priv, r); } +static void mlx5e_invalidate_encap(struct mlx5e_priv *priv, + struct mlx5e_encap_entry *e, + struct list_head *encap_flows) +{ + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5e_tc_flow *flow; + + list_for_each_entry(flow, encap_flows, tmp_list) { + struct mlx5_flow_attr *attr = flow->attr; + struct mlx5_esw_flow_attr *esw_attr; + + if (!mlx5e_is_offloaded_flow(flow)) + continue; + esw_attr = attr->esw_attr; + + if (flow_flag_test(flow, SLOW)) + mlx5e_tc_unoffload_from_slow_path(esw, flow); + else + mlx5e_tc_unoffload_fdb_rules(esw, flow, flow->attr); + mlx5_modify_header_dealloc(priv->mdev, attr->modify_hdr); + attr->modify_hdr = NULL; + + esw_attr->dests[flow->tmp_entry_index].flags &= + ~MLX5_ESW_DEST_ENCAP_VALID; + esw_attr->dests[flow->tmp_entry_index].pkt_reformat = NULL; + } + + e->flags |= MLX5_ENCAP_ENTRY_NO_ROUTE; + if (e->flags & MLX5_ENCAP_ENTRY_VALID) { + e->flags &= ~MLX5_ENCAP_ENTRY_VALID; + mlx5_packet_reformat_dealloc(priv->mdev, e->pkt_reformat); + e->pkt_reformat = NULL; + } +} + +static void mlx5e_reoffload_encap(struct mlx5e_priv *priv, + struct net_device *tunnel_dev, + struct mlx5e_encap_entry *e, + struct list_head *encap_flows) +{ + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5e_tc_flow *flow; + int err; + + err = ip_tunnel_info_af(e->tun_info) == AF_INET ? + mlx5e_tc_tun_update_header_ipv4(priv, tunnel_dev, e) : + mlx5e_tc_tun_update_header_ipv6(priv, tunnel_dev, e); + if (err) + mlx5_core_warn(priv->mdev, "Failed to update encap header, %d", err); + e->flags &= ~MLX5_ENCAP_ENTRY_NO_ROUTE; + + list_for_each_entry(flow, encap_flows, tmp_list) { + struct mlx5e_tc_flow_parse_attr *parse_attr; + struct mlx5_flow_attr *attr = flow->attr; + struct mlx5_esw_flow_attr *esw_attr; + struct mlx5_flow_handle *rule; + struct mlx5_flow_spec *spec; + + if (flow_flag_test(flow, FAILED)) + continue; + + esw_attr = attr->esw_attr; + parse_attr = attr->parse_attr; + spec = &parse_attr->spec; + + err = mlx5e_update_vf_tunnel(esw, esw_attr, &parse_attr->mod_hdr_acts, + e->out_dev, e->route_dev_ifindex, + flow->tmp_entry_index); + if (err) { + mlx5_core_warn(priv->mdev, "Failed to update VF tunnel err=%d", err); + continue; + } + + err = mlx5e_tc_add_flow_mod_hdr(priv, parse_attr, flow); + if (err) { + mlx5_core_warn(priv->mdev, "Failed to update flow mod_hdr err=%d", + err); + continue; + } + + if (e->flags & MLX5_ENCAP_ENTRY_VALID) { + esw_attr->dests[flow->tmp_entry_index].pkt_reformat = e->pkt_reformat; + esw_attr->dests[flow->tmp_entry_index].flags |= MLX5_ESW_DEST_ENCAP_VALID; + if (!mlx5e_tc_flow_all_encaps_valid(esw_attr)) + goto offload_to_slow_path; + /* update from slow path rule to encap rule */ + rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, attr); + if (IS_ERR(rule)) { + err = PTR_ERR(rule); + mlx5_core_warn(priv->mdev, "Failed to update cached encapsulation flow, %d\n", + err); + } else { + flow->rule[0] = rule; + } + } else { +offload_to_slow_path: + rule = mlx5e_tc_offload_to_slow_path(esw, flow, spec); + /* mark the flow's encap dest as non-valid */ + esw_attr->dests[flow->tmp_entry_index].flags &= + ~MLX5_ESW_DEST_ENCAP_VALID; + + if (IS_ERR(rule)) { + err = PTR_ERR(rule); + mlx5_core_warn(priv->mdev, "Failed to update slow path (encap) flow, %d\n", + err); + } else { + flow->rule[0] = rule; + } + } + flow_flag_set(flow, OFFLOADED); + } +} + +static int mlx5e_update_route_encaps(struct mlx5e_priv *priv, + struct mlx5e_route_entry *r, + struct list_head *flow_list, + bool replace) +{ + struct net_device *tunnel_dev; + struct mlx5e_encap_entry *e; + + tunnel_dev = __dev_get_by_index(dev_net(priv->netdev), r->tunnel_dev_index); + if (!tunnel_dev) + return -ENODEV; + + list_for_each_entry(e, &r->encap_entries, route_list) { + LIST_HEAD(encap_flows); + + mlx5e_take_all_encap_flows(e, &encap_flows); + if (list_empty(&encap_flows)) + continue; + + if (mlx5e_route_entry_valid(r)) + mlx5e_invalidate_encap(priv, e, &encap_flows); + + if (!replace) { + list_splice(&encap_flows, flow_list); + continue; + } + + mlx5e_reoffload_encap(priv, tunnel_dev, e, &encap_flows); + list_splice(&encap_flows, flow_list); + } + + return 0; +} + +static void mlx5e_unoffload_flow_list(struct mlx5e_priv *priv, + struct list_head *flow_list) +{ + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5e_tc_flow *flow; + + list_for_each_entry(flow, flow_list, tmp_list) + if (mlx5e_is_offloaded_flow(flow)) + mlx5e_tc_unoffload_fdb_rules(esw, flow, flow->attr); +} + +static void mlx5e_reoffload_decap(struct mlx5e_priv *priv, + struct list_head *decap_flows) +{ + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5e_tc_flow *flow; + + list_for_each_entry(flow, decap_flows, tmp_list) { + struct mlx5e_tc_flow_parse_attr *parse_attr; + struct mlx5_flow_attr *attr = flow->attr; + struct mlx5_flow_handle *rule; + struct mlx5_flow_spec *spec; + int err; + + if (flow_flag_test(flow, FAILED)) + continue; + + parse_attr = attr->parse_attr; + spec = &parse_attr->spec; + err = mlx5e_tc_tun_route_lookup(priv, spec, attr); + if (err) { + mlx5_core_warn(priv->mdev, "Failed to lookup route for flow, %d\n", + err); + continue; + } + + rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, attr); + if (IS_ERR(rule)) { + err = PTR_ERR(rule); + mlx5_core_warn(priv->mdev, "Failed to update cached decap flow, %d\n", + err); + } else { + flow->rule[0] = rule; + flow_flag_set(flow, OFFLOADED); + } + } +} + +static int mlx5e_update_route_decap_flows(struct mlx5e_priv *priv, + struct mlx5e_route_entry *r, + struct list_head *flow_list, + bool replace) +{ + struct net_device *tunnel_dev; + LIST_HEAD(decap_flows); + + tunnel_dev = __dev_get_by_index(dev_net(priv->netdev), r->tunnel_dev_index); + if (!tunnel_dev) + return -ENODEV; + + mlx5e_take_all_route_decap_flows(r, &decap_flows); + if (mlx5e_route_entry_valid(r)) + mlx5e_unoffload_flow_list(priv, &decap_flows); + if (replace) + mlx5e_reoffload_decap(priv, &decap_flows); + + list_splice(&decap_flows, flow_list); + + return 0; +} + +static void mlx5e_tc_fib_event_work(struct work_struct *work) +{ + struct mlx5e_tc_fib_event_data *event_data = + container_of(work, struct mlx5e_tc_fib_event_data, work); + struct net_device *ul_dev = event_data->ul_dev; + struct mlx5e_priv *priv = netdev_priv(ul_dev); + struct mlx5e_route_entry *r = event_data->r; + struct mlx5_eswitch *esw; + LIST_HEAD(flow_list); + bool replace; + int err; + + /* sync with concurrent neigh updates */ + rtnl_lock(); + esw = priv->mdev->priv.eswitch; + mutex_lock(&esw->offloads.encap_tbl_lock); + replace = event_data->event == FIB_EVENT_ENTRY_REPLACE; + + if (!mlx5e_route_entry_valid(r) && !replace) + goto out; + + err = mlx5e_update_route_encaps(priv, r, &flow_list, replace); + if (err) + mlx5_core_warn(priv->mdev, "Failed to update route encaps, %d\n", + err); + + err = mlx5e_update_route_decap_flows(priv, r, &flow_list, replace); + if (err) + mlx5_core_warn(priv->mdev, "Failed to update route decap flows, %d\n", + err); + + if (replace) + r->flags |= MLX5E_ROUTE_ENTRY_VALID; +out: + mutex_unlock(&esw->offloads.encap_tbl_lock); + rtnl_unlock(); + + mlx5e_put_flow_list(priv, &flow_list); + mlx5e_route_put(priv, event_data->r); + dev_put(event_data->ul_dev); + kfree(event_data); +} + +static struct mlx5e_tc_fib_event_data * +mlx5e_init_fib_work_ipv4(struct mlx5e_priv *priv, + struct net_device *ul_dev, + struct mlx5e_tc_tun_encap *encap, + unsigned long event, + struct fib_notifier_info *info) +{ + struct fib_entry_notifier_info *fen_info; + struct mlx5e_tc_fib_event_data *fib_work; + struct mlx5e_route_entry *r; + struct mlx5e_route_key key; + struct net_device *fib_dev; + + fen_info = container_of(info, struct fib_entry_notifier_info, info); + fib_dev = fib_info_nh(fen_info->fi, 0)->fib_nh_dev; + if (fib_dev->netdev_ops != &mlx5e_netdev_ops || + fen_info->dst_len != 32) + return NULL; + + fib_work = mlx5e_tc_init_fib_work(event, ul_dev, GFP_ATOMIC); + if (!fib_work) + return ERR_PTR(-ENOMEM); + + key.endpoint_ip.v4 = htonl(fen_info->dst); + key.ip_version = 4; + + /* Can't fail after this point because releasing reference to r + * requires obtaining sleeping mutex which we can't do in atomic + * context. + */ + r = mlx5e_route_lookup_for_update(encap, &key); + if (!r) + goto out; + fib_work->r = r; + dev_hold(ul_dev); + + return fib_work; + +out: + kfree(fib_work); + return NULL; +} + +static struct mlx5e_tc_fib_event_data * +mlx5e_init_fib_work_ipv6(struct mlx5e_priv *priv, + struct net_device *ul_dev, + struct mlx5e_tc_tun_encap *encap, + unsigned long event, + struct fib_notifier_info *info) +{ + struct fib6_entry_notifier_info *fen_info; + struct mlx5e_tc_fib_event_data *fib_work; + struct mlx5e_route_entry *r; + struct mlx5e_route_key key; + struct net_device *fib_dev; + + fen_info = container_of(info, struct fib6_entry_notifier_info, info); + fib_dev = fib6_info_nh_dev(fen_info->rt); + if (fib_dev->netdev_ops != &mlx5e_netdev_ops || + fen_info->rt->fib6_dst.plen != 128) + return NULL; + + fib_work = mlx5e_tc_init_fib_work(event, ul_dev, GFP_ATOMIC); + if (!fib_work) + return ERR_PTR(-ENOMEM); + + memcpy(&key.endpoint_ip.v6, &fen_info->rt->fib6_dst.addr, + sizeof(fen_info->rt->fib6_dst.addr)); + key.ip_version = 6; + + /* Can't fail after this point because releasing reference to r + * requires obtaining sleeping mutex which we can't do in atomic + * context. + */ + r = mlx5e_route_lookup_for_update(encap, &key); + if (!r) + goto out; + fib_work->r = r; + dev_hold(ul_dev); + + return fib_work; + +out: + kfree(fib_work); + return NULL; +} + +static int mlx5e_tc_tun_fib_event(struct notifier_block *nb, unsigned long event, void *ptr) +{ + struct mlx5e_tc_fib_event_data *fib_work; + struct fib_notifier_info *info = ptr; + struct mlx5e_tc_tun_encap *encap; + struct net_device *ul_dev; + struct mlx5e_priv *priv; + + encap = container_of(nb, struct mlx5e_tc_tun_encap, fib_nb); + priv = encap->priv; + ul_dev = priv->netdev; + priv = netdev_priv(ul_dev); + + switch (event) { + case FIB_EVENT_ENTRY_REPLACE: + case FIB_EVENT_ENTRY_DEL: + if (info->family == AF_INET) + fib_work = mlx5e_init_fib_work_ipv4(priv, ul_dev, encap, event, info); + else if (info->family == AF_INET6) + fib_work = mlx5e_init_fib_work_ipv6(priv, ul_dev, encap, event, info); + else + return NOTIFY_DONE; + + if (!IS_ERR_OR_NULL(fib_work)) { + queue_work(priv->wq, &fib_work->work); + } else if (IS_ERR(fib_work)) { + NL_SET_ERR_MSG_MOD(info->extack, "Failed to init fib work"); + mlx5_core_warn(priv->mdev, "Failed to init fib work, %ld\n", + PTR_ERR(fib_work)); + } + + break; + default: + return NOTIFY_DONE; + } + + return NOTIFY_DONE; +} + +struct mlx5e_tc_tun_encap *mlx5e_tc_tun_init(struct mlx5e_priv *priv) +{ + struct mlx5e_tc_tun_encap *encap; + int err; + + encap = kvzalloc(sizeof(*encap), GFP_KERNEL); + if (!encap) + return ERR_PTR(-ENOMEM); + + encap->priv = priv; + encap->fib_nb.notifier_call = mlx5e_tc_tun_fib_event; + spin_lock_init(&encap->route_lock); + hash_init(encap->route_tbl); + err = register_fib_notifier(dev_net(priv->netdev), &encap->fib_nb, + NULL, NULL); + if (err) { + kvfree(encap); + return ERR_PTR(err); + } + + return encap; +} + +void mlx5e_tc_tun_cleanup(struct mlx5e_tc_tun_encap *encap) +{ + if (!encap) + return; + + unregister_fib_notifier(dev_net(encap->priv->netdev), &encap->fib_nb); + flush_workqueue(encap->priv->wq); /* flush fib event works */ + kvfree(encap); +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.h index 9939cff6e6c5..3391504d9a08 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.h @@ -32,4 +32,7 @@ struct ip_tunnel_info *mlx5e_dup_tun_info(const struct ip_tunnel_info *tun_info) int mlx5e_tc_set_attr_rx_tun(struct mlx5e_tc_flow *flow, struct mlx5_flow_spec *spec); +struct mlx5e_tc_tun_encap *mlx5e_tc_tun_init(struct mlx5e_priv *priv); +void mlx5e_tc_tun_cleanup(struct mlx5e_tc_tun_encap *encap); + #endif /* __MLX5_EN_TC_TUN_ENCAP_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h index e947921a2d5a..d1696404cca9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h @@ -59,6 +59,8 @@ struct mlx5e_neigh_update_table { struct mlx5_tc_ct_priv; struct mlx5e_rep_bond; +struct mlx5e_tc_tun_encap; + struct mlx5_rep_uplink_priv { /* Filters DB - instantiated by the uplink representor and shared by * the uplink's VFs @@ -90,6 +92,9 @@ struct mlx5_rep_uplink_priv { /* support eswitch vports bonding */ struct mlx5e_rep_bond *bond; + + /* tc tunneling encapsulation private data */ + struct mlx5e_tc_tun_encap *encap; }; struct mlx5e_rep_priv { @@ -153,6 +158,7 @@ enum { /* set when the encap entry is successfully offloaded into HW */ MLX5_ENCAP_ENTRY_VALID = BIT(0), MLX5_REFORMAT_DECAP = BIT(1), + MLX5_ENCAP_ENTRY_NO_ROUTE = BIT(2), }; struct mlx5e_decap_key { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index c5ecb9e4e767..db142ee96510 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1279,11 +1279,11 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, struct net_device *out_dev, *encap_dev = NULL; struct mlx5e_tc_flow_parse_attr *parse_attr; struct mlx5_flow_attr *attr = flow->attr; + bool vf_tun = false, encap_valid = true; struct mlx5_esw_flow_attr *esw_attr; struct mlx5_fc *counter = NULL; struct mlx5e_rep_priv *rpriv; struct mlx5e_priv *out_priv; - bool encap_valid = true; u32 max_prio, max_chain; int err = 0; int out_index; @@ -1297,26 +1297,28 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, if (!mlx5e_is_ft_flow(flow) && attr->chain > max_chain) { NL_SET_ERR_MSG_MOD(extack, "Requested chain is out of supported range"); - return -EOPNOTSUPP; + err = -EOPNOTSUPP; + goto err_out; } max_prio = mlx5_chains_get_prio_range(esw_chains(esw)); if (attr->prio > max_prio) { NL_SET_ERR_MSG_MOD(extack, "Requested priority is out of supported range"); - return -EOPNOTSUPP; + err = -EOPNOTSUPP; + goto err_out; } if (flow_flag_test(flow, TUN_RX)) { err = mlx5e_attach_decap_route(priv, flow); if (err) - return err; + goto err_out; } if (flow_flag_test(flow, L3_TO_L2_DECAP)) { err = mlx5e_attach_decap(priv, flow, extack); if (err) - return err; + goto err_out; } parse_attr = attr->parse_attr; @@ -1334,8 +1336,11 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, err = mlx5e_attach_encap(priv, flow, out_dev, out_index, extack, &encap_dev, &encap_valid); if (err) - return err; + goto err_out; + if (esw_attr->dests[out_index].flags & + MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE) + vf_tun = true; out_priv = netdev_priv(encap_dev); rpriv = out_priv->ppriv; esw_attr->dests[out_index].rep = rpriv->rep; @@ -1344,19 +1349,27 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, err = mlx5_eswitch_add_vlan_action(esw, attr); if (err) - return err; + goto err_out; if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR && !(attr->ct_attr.ct_action & TCA_CT_ACT_CLEAR)) { - err = mlx5e_attach_mod_hdr(priv, flow, parse_attr); - if (err) - return err; + if (vf_tun) { + err = mlx5e_tc_add_flow_mod_hdr(priv, parse_attr, flow); + if (err) + goto err_out; + } else { + err = mlx5e_attach_mod_hdr(priv, flow, parse_attr); + if (err) + goto err_out; + } } if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) { counter = mlx5_fc_create(esw_attr->counter_dev, true); - if (IS_ERR(counter)) - return PTR_ERR(counter); + if (IS_ERR(counter)) { + err = PTR_ERR(counter); + goto err_out; + } attr->counter = counter; } @@ -1370,12 +1383,17 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, else flow->rule[0] = mlx5e_tc_offload_fdb_rules(esw, flow, &parse_attr->spec, attr); - if (IS_ERR(flow->rule[0])) - return PTR_ERR(flow->rule[0]); - else - flow_flag_set(flow, OFFLOADED); + if (IS_ERR(flow->rule[0])) { + err = PTR_ERR(flow->rule[0]); + goto err_out; + } + flow_flag_set(flow, OFFLOADED); return 0; + +err_out: + flow_flag_set(flow, FAILED); + return err; } static bool mlx5_flow_has_geneve_opt(struct mlx5e_tc_flow *flow) @@ -1397,6 +1415,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; struct mlx5_flow_attr *attr = flow->attr; struct mlx5_esw_flow_attr *esw_attr; + bool vf_tun = false; int out_index; esw_attr = attr->esw_attr; @@ -1421,20 +1440,26 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, mlx5e_detach_decap_route(priv, flow); for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++) { + if (esw_attr->dests[out_index].flags & + MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE) + vf_tun = true; if (esw_attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP) { mlx5e_detach_encap(priv, flow, out_index); kfree(attr->parse_attr->tun_info[out_index]); } } - kvfree(attr->parse_attr); - kvfree(attr->esw_attr->rx_tun_attr); mlx5_tc_ct_match_del(get_ct_priv(priv), &flow->attr->ct_attr); if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) { dealloc_mod_hdr_actions(&attr->parse_attr->mod_hdr_acts); - mlx5e_detach_mod_hdr(priv, flow); + if (vf_tun && attr->modify_hdr) + mlx5_modify_header_dealloc(priv->mdev, attr->modify_hdr); + else + mlx5e_detach_mod_hdr(priv, flow); } + kvfree(attr->parse_attr); + kvfree(attr->esw_attr->rx_tun_attr); if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) mlx5_fc_destroy(esw_attr->counter_dev, attr->counter); @@ -4044,7 +4069,6 @@ __mlx5e_add_fdb_flow(struct mlx5e_priv *priv, return flow; err_free: - dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts); mlx5e_flow_put(priv, flow); out: return ERR_PTR(err); @@ -4189,6 +4213,7 @@ mlx5e_add_nic_flow(struct mlx5e_priv *priv, return 0; err_free: + flow_flag_set(flow, FAILED); dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts); mlx5e_flow_put(priv, flow); out: @@ -4724,8 +4749,14 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht) lockdep_set_class(&tc_ht->mutex, &tc_ht_lock_key); + uplink_priv->encap = mlx5e_tc_tun_init(priv); + if (IS_ERR(uplink_priv->encap)) + goto err_register_fib_notifier; + return err; +err_register_fib_notifier: + rhashtable_destroy(tc_ht); err_ht_init: mapping_destroy(uplink_priv->tunnel_enc_opts_mapping); err_enc_opts_mapping: @@ -4742,10 +4773,11 @@ void mlx5e_tc_esw_cleanup(struct rhashtable *tc_ht) { struct mlx5_rep_uplink_priv *uplink_priv; - rhashtable_free_and_destroy(tc_ht, _mlx5e_tc_del_flow, NULL); - uplink_priv = container_of(tc_ht, struct mlx5_rep_uplink_priv, tc_ht); + rhashtable_free_and_destroy(tc_ht, _mlx5e_tc_del_flow, NULL); + mlx5e_tc_tun_cleanup(uplink_priv->encap); + mapping_destroy(uplink_priv->tunnel_enc_opts_mapping); mapping_destroy(uplink_priv->tunnel_mapping); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 310c405e81d7..aba17835465b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1830,7 +1830,6 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) mutex_init(&esw->offloads.decap_tbl_lock); hash_init(esw->offloads.decap_tbl); mlx5e_mod_hdr_tbl_init(&esw->offloads.mod_hdr); - hash_init(esw->offloads.route_tbl); atomic64_set(&esw->offloads.num_flows, 0); ida_init(&esw->offloads.vport_metadata_ida); xa_init_flags(&esw->offloads.vhca_map, XA_FLAGS_ALLOC); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index b90724e27960..fdf5c8c05c1b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -214,7 +214,6 @@ struct mlx5_esw_offload { struct mutex peer_mutex; struct mutex encap_tbl_lock; /* protects encap_tbl */ DECLARE_HASHTABLE(encap_tbl, 8); - DECLARE_HASHTABLE(route_tbl, 8); struct mutex decap_tbl_lock; /* protects decap_tbl */ DECLARE_HASHTABLE(decap_tbl, 8); struct mod_hdr_tbl mod_hdr; @@ -424,6 +423,7 @@ struct mlx5_esw_flow_attr { struct mlx5_pkt_reformat *pkt_reformat; struct mlx5_core_dev *mdev; struct mlx5_termtbl_handle *termtbl; + int src_port_rewrite_act_id; } dests[MLX5_MAX_FLOW_FWD_VPORTS]; struct mlx5_rx_tun_attr *rx_tun_attr; struct mlx5_pkt_reformat *decap_pkt_reformat;