From patchwork Mon Aug 17 09:37:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Moshe Shemesh X-Patchwork-Id: 262422 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EA5FC433DF for ; Mon, 17 Aug 2020 09:39:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5884120758 for ; Mon, 17 Aug 2020 09:39:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726710AbgHQJj0 (ORCPT ); Mon, 17 Aug 2020 05:39:26 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:55841 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728422AbgHQJjW (ORCPT ); Mon, 17 Aug 2020 05:39:22 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from moshe@mellanox.com) with SMTP; 17 Aug 2020 12:38:11 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (dev-l-vrt-135.mtl.labs.mlnx [10.234.135.1]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 07H9cBox011400; Mon, 17 Aug 2020 12:38:11 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (localhost [127.0.0.1]) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Debian-10) with ESMTP id 07H9cB7j003227; Mon, 17 Aug 2020 12:38:11 +0300 Received: (from moshe@localhost) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Submit) id 07H9cBtG003226; Mon, 17 Aug 2020 12:38:11 +0300 From: Moshe Shemesh To: "David S. Miller" , Jakub Kicinski , Jiri Pirko Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Moshe Shemesh Subject: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command Date: Mon, 17 Aug 2020 12:37:40 +0300 Message-Id: <1597657072-3130-2-git-send-email-moshe@mellanox.com> X-Mailer: git-send-email 1.8.4.3 In-Reply-To: <1597657072-3130-1-git-send-email-moshe@mellanox.com> References: <1597657072-3130-1-git-send-email-moshe@mellanox.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add devlink reload action to allow the user to request a specific reload action. The action parameter is optional, if not specified then devlink driver re-init action is used (backward compatible). Note that when required to do firmware activation some drivers may need to reload the driver. On the other hand some drivers may need to reset the firmware to reinitialize the driver entities. Reload actions supported are: driver_reinit: driver entities re-initialization, applying devlink-param and devlink-resource values. fw_activate: firmware activate. fw_live_patch: firmware live patching. Signed-off-by: Moshe Shemesh --- v1 -> v2: - Instead of reload levels driver,fw_reset,fw_live_patch have reload actions driver_reinit,fw_activate,fw_live_patch - Remove driver default level, the action driver_reinit is the default action for all drivers --- drivers/net/ethernet/mellanox/mlx4/main.c | 4 +- .../net/ethernet/mellanox/mlx5/core/devlink.c | 4 +- drivers/net/ethernet/mellanox/mlxsw/core.c | 6 ++- drivers/net/netdevsim/dev.c | 5 +- include/net/devlink.h | 5 +- include/uapi/linux/devlink.h | 19 +++++++ net/core/devlink.c | 51 +++++++++++++++++-- 7 files changed, 81 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 258c7a96f269..e7df4975bea3 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -3935,6 +3935,7 @@ static int mlx4_restart_one_up(struct pci_dev *pdev, bool reload, struct devlink *devlink); static int mlx4_devlink_reload_down(struct devlink *devlink, bool netns_change, + enum devlink_reload_action action, struct netlink_ext_ack *extack) { struct mlx4_priv *priv = devlink_priv(devlink); @@ -3951,7 +3952,7 @@ static int mlx4_devlink_reload_down(struct devlink *devlink, bool netns_change, return 0; } -static int mlx4_devlink_reload_up(struct devlink *devlink, +static int mlx4_devlink_reload_up(struct devlink *devlink, enum devlink_reload_action action, struct netlink_ext_ack *extack) { struct mlx4_priv *priv = devlink_priv(devlink); @@ -3969,6 +3970,7 @@ static int mlx4_devlink_reload_up(struct devlink *devlink, static const struct devlink_ops mlx4_devlink_ops = { .port_type_set = mlx4_devlink_port_type_set, + .supported_reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT), .reload_down = mlx4_devlink_reload_down, .reload_up = mlx4_devlink_reload_up, }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index c709e9a385f6..dfdf48869f70 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -89,6 +89,7 @@ mlx5_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req, } static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change, + enum devlink_reload_action action, struct netlink_ext_ack *extack) { struct mlx5_core_dev *dev = devlink_priv(devlink); @@ -97,7 +98,7 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change, return 0; } -static int mlx5_devlink_reload_up(struct devlink *devlink, +static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_action action, struct netlink_ext_ack *extack) { struct mlx5_core_dev *dev = devlink_priv(devlink); @@ -118,6 +119,7 @@ static const struct devlink_ops mlx5_devlink_ops = { #endif .flash_update = mlx5_devlink_flash_update, .info_get = mlx5_devlink_info_get, + .supported_reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT), .reload_down = mlx5_devlink_reload_down, .reload_up = mlx5_devlink_reload_up, }; diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index 08d101138fbe..f67c5aa2a86f 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -1113,7 +1113,7 @@ mlxsw_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req, static int mlxsw_devlink_core_bus_device_reload_down(struct devlink *devlink, - bool netns_change, + bool netns_change, enum devlink_reload_action action, struct netlink_ext_ack *extack) { struct mlxsw_core *mlxsw_core = devlink_priv(devlink); @@ -1126,7 +1126,7 @@ mlxsw_devlink_core_bus_device_reload_down(struct devlink *devlink, } static int -mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink, +mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink, enum devlink_reload_action action, struct netlink_ext_ack *extack) { struct mlxsw_core *mlxsw_core = devlink_priv(devlink); @@ -1268,6 +1268,8 @@ mlxsw_devlink_trap_policer_counter_get(struct devlink *devlink, } static const struct devlink_ops mlxsw_devlink_ops = { + .supported_reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) | + BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE), .reload_down = mlxsw_devlink_core_bus_device_reload_down, .reload_up = mlxsw_devlink_core_bus_device_reload_up, .port_type_set = mlxsw_devlink_port_type_set, diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c index 32f339fedb21..c212f502052c 100644 --- a/drivers/net/netdevsim/dev.c +++ b/drivers/net/netdevsim/dev.c @@ -697,7 +697,7 @@ static int nsim_dev_reload_create(struct nsim_dev *nsim_dev, static void nsim_dev_reload_destroy(struct nsim_dev *nsim_dev); static int nsim_dev_reload_down(struct devlink *devlink, bool netns_change, - struct netlink_ext_ack *extack) + enum devlink_reload_action action, struct netlink_ext_ack *extack) { struct nsim_dev *nsim_dev = devlink_priv(devlink); @@ -713,7 +713,7 @@ static int nsim_dev_reload_down(struct devlink *devlink, bool netns_change, return 0; } -static int nsim_dev_reload_up(struct devlink *devlink, +static int nsim_dev_reload_up(struct devlink *devlink, enum devlink_reload_action action, struct netlink_ext_ack *extack) { struct nsim_dev *nsim_dev = devlink_priv(devlink); @@ -875,6 +875,7 @@ nsim_dev_devlink_trap_policer_counter_get(struct devlink *devlink, } static const struct devlink_ops nsim_dev_devlink_ops = { + .supported_reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT), .reload_down = nsim_dev_reload_down, .reload_up = nsim_dev_reload_up, .info_get = nsim_dev_info_get, diff --git a/include/net/devlink.h b/include/net/devlink.h index 8f3c8a443238..cad3e11d0b9b 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -991,9 +991,10 @@ enum devlink_trap_group_generic_id { } struct devlink_ops { + unsigned long supported_reload_actions; int (*reload_down)(struct devlink *devlink, bool netns_change, - struct netlink_ext_ack *extack); - int (*reload_up)(struct devlink *devlink, + enum devlink_reload_action action, struct netlink_ext_ack *extack); + int (*reload_up)(struct devlink *devlink, enum devlink_reload_action action, struct netlink_ext_ack *extack); int (*port_type_set)(struct devlink_port *devlink_port, enum devlink_port_type port_type); diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index cfef4245ea5a..6728029d2e1e 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -272,6 +272,23 @@ enum { DEVLINK_ATTR_TRAP_METADATA_TYPE_FA_COOKIE, }; +/** + * enum devlink_reload_action - Reload action. + * @DEVLINK_RELOAD_ACTION_FW_LIVE_PATCH: FW live patching. + * @DEVLINK_RELOAD_ACTION_DRIVER_REINIT: Driver entities re-instantiation. + * @DEVLINK_RELOAD_ACTION_FW_ACTIVATE: FW activate. + */ +enum devlink_reload_action { + DEVLINK_RELOAD_ACTION_UNSPEC, + DEVLINK_RELOAD_ACTION_FW_LIVE_PATCH, + DEVLINK_RELOAD_ACTION_DRIVER_REINIT, + DEVLINK_RELOAD_ACTION_FW_ACTIVATE, + + /* Add new reload actions above */ + __DEVLINK_RELOAD_ACTION_MAX, + DEVLINK_RELOAD_ACTION_MAX = __DEVLINK_RELOAD_ACTION_MAX - 1 +}; + enum devlink_attr { /* don't change the order or add anything between, this is ABI! */ DEVLINK_ATTR_UNSPEC, @@ -458,6 +475,8 @@ enum devlink_attr { DEVLINK_ATTR_PORT_LANES, /* u32 */ DEVLINK_ATTR_PORT_SPLITTABLE, /* u8 */ + DEVLINK_ATTR_RELOAD_ACTION, /* u8 */ + /* add new attributes above here, update the policy in devlink.c */ __DEVLINK_ATTR_MAX, diff --git a/net/core/devlink.c b/net/core/devlink.c index e674f0f46dc2..88438ffd6015 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -462,6 +462,12 @@ static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink) return 0; } +static bool +devlink_reload_action_is_supported(struct devlink *devlink, enum devlink_reload_action action) +{ + return test_bit(action, &devlink->ops->supported_reload_actions); +} + static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink, enum devlink_command cmd, u32 portid, u32 seq, int flags) @@ -2964,21 +2970,21 @@ bool devlink_is_reload_failed(const struct devlink *devlink) EXPORT_SYMBOL_GPL(devlink_is_reload_failed); static int devlink_reload(struct devlink *devlink, struct net *dest_net, - struct netlink_ext_ack *extack) + enum devlink_reload_action action, struct netlink_ext_ack *extack) { int err; if (!devlink->reload_enabled) return -EOPNOTSUPP; - err = devlink->ops->reload_down(devlink, !!dest_net, extack); + err = devlink->ops->reload_down(devlink, !!dest_net, action, extack); if (err) return err; if (dest_net && !net_eq(dest_net, devlink_net(devlink))) devlink_reload_netns_change(devlink, dest_net); - err = devlink->ops->reload_up(devlink, extack); + err = devlink->ops->reload_up(devlink, action, extack); devlink_reload_failed_set(devlink, !!err); return err; } @@ -2986,6 +2992,7 @@ static int devlink_reload(struct devlink *devlink, struct net *dest_net, static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) { struct devlink *devlink = info->user_ptr[0]; + enum devlink_reload_action action; struct net *dest_net = NULL; int err; @@ -3006,7 +3013,20 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) return PTR_ERR(dest_net); } - err = devlink_reload(devlink, dest_net, info->extack); + if (info->attrs[DEVLINK_ATTR_RELOAD_ACTION]) + action = nla_get_u8(info->attrs[DEVLINK_ATTR_RELOAD_ACTION]); + else + action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT; + + if (action == DEVLINK_RELOAD_ACTION_UNSPEC || action > DEVLINK_RELOAD_ACTION_MAX) { + NL_SET_ERR_MSG_MOD(info->extack, "Invalid reload action"); + return -EINVAL; + } else if (!devlink_reload_action_is_supported(devlink, action)) { + NL_SET_ERR_MSG_MOD(info->extack, "Requested reload action is not supported"); + return -EOPNOTSUPP; + } + + err = devlink_reload(devlink, dest_net, action, info->extack); if (dest_net) put_net(dest_net); @@ -7039,6 +7059,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { [DEVLINK_ATTR_TRAP_POLICER_RATE] = { .type = NLA_U64 }, [DEVLINK_ATTR_TRAP_POLICER_BURST] = { .type = NLA_U64 }, [DEVLINK_ATTR_PORT_FUNCTION] = { .type = NLA_NESTED }, + [DEVLINK_ATTR_RELOAD_ACTION] = { .type = NLA_U8 }, }; static const struct genl_ops devlink_nl_ops[] = { @@ -7364,6 +7385,20 @@ static struct genl_family devlink_nl_family __ro_after_init = { .n_mcgrps = ARRAY_SIZE(devlink_nl_mcgrps), }; +static int devlink_reload_actions_verify(struct devlink *devlink) +{ + const struct devlink_ops *ops; + + if (!devlink_reload_supported(devlink)) + return 0; + + ops = devlink->ops; + if (WARN_ON(ops->supported_reload_actions >= BIT(__DEVLINK_RELOAD_ACTION_MAX) || + ops->supported_reload_actions <= BIT(DEVLINK_RELOAD_ACTION_UNSPEC))) + return -EINVAL; + return 0; +} + /** * devlink_alloc - Allocate new devlink instance resources * @@ -7384,6 +7419,11 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size) if (!devlink) return NULL; devlink->ops = ops; + if (devlink_reload_actions_verify(devlink)) { + kfree(devlink); + return NULL; + } + xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC); __devlink_net_set(devlink, &init_net); INIT_LIST_HEAD(&devlink->port_list); @@ -9615,7 +9655,8 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net) if (net_eq(devlink_net(devlink), net)) { if (WARN_ON(!devlink_reload_supported(devlink))) continue; - err = devlink_reload(devlink, &init_net, NULL); + err = devlink_reload(devlink, &init_net, + DEVLINK_RELOAD_ACTION_DRIVER_REINIT, NULL); if (err && err != -EOPNOTSUPP) pr_warn("Failed to reload devlink instance into init_net\n"); } From patchwork Mon Aug 17 09:37:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Moshe Shemesh X-Patchwork-Id: 262421 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA566C433E1 for ; Mon, 17 Aug 2020 09:39:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CED2620758 for ; Mon, 17 Aug 2020 09:39:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727010AbgHQJjF (ORCPT ); Mon, 17 Aug 2020 05:39:05 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:55827 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728366AbgHQJiS (ORCPT ); Mon, 17 Aug 2020 05:38:18 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from moshe@mellanox.com) with SMTP; 17 Aug 2020 12:38:12 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (dev-l-vrt-135.mtl.labs.mlnx [10.234.135.1]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 07H9cCtd011406; Mon, 17 Aug 2020 12:38:12 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (localhost [127.0.0.1]) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Debian-10) with ESMTP id 07H9cB3B003231; Mon, 17 Aug 2020 12:38:12 +0300 Received: (from moshe@localhost) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Submit) id 07H9cBUV003230; Mon, 17 Aug 2020 12:38:11 +0300 From: Moshe Shemesh To: "David S. Miller" , Jakub Kicinski , Jiri Pirko Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Moshe Shemesh Subject: [PATCH net-next RFC v2 03/13] net/mlx5: Add functions to set/query MFRL register Date: Mon, 17 Aug 2020 12:37:42 +0300 Message-Id: <1597657072-3130-4-git-send-email-moshe@mellanox.com> X-Mailer: git-send-email 1.8.4.3 In-Reply-To: <1597657072-3130-1-git-send-email-moshe@mellanox.com> References: <1597657072-3130-1-git-send-email-moshe@mellanox.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add functions to query and set the MFRL reset options supported by firmware. Signed-off-by: Moshe Shemesh --- .../net/ethernet/mellanox/mlx5/core/Makefile | 2 +- .../ethernet/mellanox/mlx5/core/fw_reset.c | 46 +++++++++++++++++++ .../ethernet/mellanox/mlx5/core/fw_reset.h | 13 ++++++ 3 files changed, 60 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 10e6886c96ba..4d45a2f6fed6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -16,7 +16,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ transobj.o vport.o sriov.o fs_cmd.o fs_core.o pci_irq.o \ fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \ lib/devcom.o lib/pci_vsc.o lib/dm.o diag/fs_tracepoint.o \ - diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o + diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o fw_reset.o # # Netdev basic diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c new file mode 100644 index 000000000000..76d2cece29ac --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c @@ -0,0 +1,46 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* Copyright (c) 2020, Mellanox Technologies inc. All rights reserved. */ + +#include "fw_reset.h" + +static int mlx5_reg_mfrl_set(struct mlx5_core_dev *dev, u8 reset_level, + u8 reset_type_sel, u8 sync_resp, bool sync_start) +{ + u32 out[MLX5_ST_SZ_DW(mfrl_reg)] = {}; + u32 in[MLX5_ST_SZ_DW(mfrl_reg)] = {}; + + MLX5_SET(mfrl_reg, in, reset_level, reset_level); + MLX5_SET(mfrl_reg, in, rst_type_sel, reset_type_sel); + MLX5_SET(mfrl_reg, in, pci_sync_for_fw_update_resp, sync_resp); + MLX5_SET(mfrl_reg, in, pci_sync_for_fw_update_start, sync_start); + + return mlx5_core_access_reg(dev, in, sizeof(in), out, sizeof(out), MLX5_REG_MFRL, 0, 1); +} + +int mlx5_reg_mfrl_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_type) +{ + u32 out[MLX5_ST_SZ_DW(mfrl_reg)] = {}; + u32 in[MLX5_ST_SZ_DW(mfrl_reg)] = {}; + int err; + + err = mlx5_core_access_reg(dev, in, sizeof(in), out, sizeof(out), MLX5_REG_MFRL, 0, 0); + if (err) + return err; + + if (reset_level) + *reset_level = MLX5_GET(mfrl_reg, out, reset_level); + if (reset_type) + *reset_type = MLX5_GET(mfrl_reg, out, reset_type); + + return 0; +} + +int mlx5_fw_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel) +{ + return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, reset_type_sel, 0, true); +} + +int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev) +{ + return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL0, 0, 0, false); +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h new file mode 100644 index 000000000000..1bbd95182ca6 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* Copyright (c) 2020, Mellanox Technologies inc. All rights reserved. */ + +#ifndef __MLX5_FW_RESET_H +#define __MLX5_FW_RESET_H + +#include "mlx5_core.h" + +int mlx5_reg_mfrl_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_type); +int mlx5_fw_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel); +int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev); + +#endif From patchwork Mon Aug 17 09:37:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Moshe Shemesh X-Patchwork-Id: 262419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 380C4C433E3 for ; Mon, 17 Aug 2020 09:40:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D8E820758 for ; Mon, 17 Aug 2020 09:40:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726903AbgHQJjo (ORCPT ); Mon, 17 Aug 2020 05:39:44 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:55843 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728392AbgHQJiR (ORCPT ); Mon, 17 Aug 2020 05:38:17 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from moshe@mellanox.com) with SMTP; 17 Aug 2020 12:38:12 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (dev-l-vrt-135.mtl.labs.mlnx [10.234.135.1]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 07H9cCUU011409; Mon, 17 Aug 2020 12:38:12 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (localhost [127.0.0.1]) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Debian-10) with ESMTP id 07H9cCuV003233; Mon, 17 Aug 2020 12:38:12 +0300 Received: (from moshe@localhost) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Submit) id 07H9cCNT003232; Mon, 17 Aug 2020 12:38:12 +0300 From: Moshe Shemesh To: "David S. Miller" , Jakub Kicinski , Jiri Pirko Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Moshe Shemesh Subject: [PATCH net-next RFC v2 04/13] net/mlx5: Set cap for pci sync for fw update event Date: Mon, 17 Aug 2020 12:37:43 +0300 Message-Id: <1597657072-3130-5-git-send-email-moshe@mellanox.com> X-Mailer: git-send-email 1.8.4.3 In-Reply-To: <1597657072-3130-1-git-send-email-moshe@mellanox.com> References: <1597657072-3130-1-git-send-email-moshe@mellanox.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Set capability to notify the firmware that this host driver is capable of handling pci sync for firmware update events. Signed-off-by: Moshe Shemesh --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index ce43e3feccd9..871d28b09f8a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -548,6 +548,9 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx) if (MLX5_CAP_GEN_MAX(dev, dct)) MLX5_SET(cmd_hca_cap, set_hca_cap, dct, 1); + if (MLX5_CAP_GEN_MAX(dev, pci_sync_for_fw_update_event)) + MLX5_SET(cmd_hca_cap, set_hca_cap, pci_sync_for_fw_update_event, 1); + if (MLX5_CAP_GEN_MAX(dev, num_vhca_ports)) MLX5_SET(cmd_hca_cap, set_hca_cap, From patchwork Mon Aug 17 09:37:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Moshe Shemesh X-Patchwork-Id: 262423 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BDDDC433E3 for ; Mon, 17 Aug 2020 09:38:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EC02620758 for ; Mon, 17 Aug 2020 09:38:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728615AbgHQJie (ORCPT ); Mon, 17 Aug 2020 05:38:34 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:55838 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728363AbgHQJi0 (ORCPT ); Mon, 17 Aug 2020 05:38:26 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from moshe@mellanox.com) with SMTP; 17 Aug 2020 12:38:12 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (dev-l-vrt-135.mtl.labs.mlnx [10.234.135.1]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 07H9cC4D011413; Mon, 17 Aug 2020 12:38:12 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (localhost [127.0.0.1]) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Debian-10) with ESMTP id 07H9cCKo003235; Mon, 17 Aug 2020 12:38:12 +0300 Received: (from moshe@localhost) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Submit) id 07H9cCTd003234; Mon, 17 Aug 2020 12:38:12 +0300 From: Moshe Shemesh To: "David S. Miller" , Jakub Kicinski , Jiri Pirko Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Moshe Shemesh Subject: [PATCH net-next RFC v2 05/13] net/mlx5: Handle sync reset request event Date: Mon, 17 Aug 2020 12:37:44 +0300 Message-Id: <1597657072-3130-6-git-send-email-moshe@mellanox.com> X-Mailer: git-send-email 1.8.4.3 In-Reply-To: <1597657072-3130-1-git-send-email-moshe@mellanox.com> References: <1597657072-3130-1-git-send-email-moshe@mellanox.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Once the driver gets sync_reset_request from firmware it prepares for the coming reset and sends acknowledge. After getting this event the driver expects device reset, either it will trigger PCI reset on sync_reset_now event or such PCI reset will be triggered by another PF of the same device. So it moves to reset requested mode and if it gets PCI reset triggered by the other PF it detect the reset and reloads. Signed-off-by: Moshe Shemesh --- v1 -> v2: - Moved handling of sync reset recovery from health to fw_reset --- .../ethernet/mellanox/mlx5/core/fw_reset.c | 167 ++++++++++++++++++ .../ethernet/mellanox/mlx5/core/fw_reset.h | 3 + .../net/ethernet/mellanox/mlx5/core/health.c | 35 ++-- .../net/ethernet/mellanox/mlx5/core/main.c | 10 ++ .../ethernet/mellanox/mlx5/core/mlx5_core.h | 2 + include/linux/mlx5/driver.h | 4 + 6 files changed, 206 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c index 76d2cece29ac..0f224454b4a2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c @@ -3,6 +3,20 @@ #include "fw_reset.h" +enum { + MLX5_FW_RESET_FLAGS_RESET_REQUESTED, +}; + +struct mlx5_fw_reset { + struct mlx5_core_dev *dev; + struct mlx5_nb nb; + struct workqueue_struct *wq; + struct work_struct reset_request_work; + struct work_struct reset_reload_work; + unsigned long reset_flags; + struct timer_list timer; +}; + static int mlx5_reg_mfrl_set(struct mlx5_core_dev *dev, u8 reset_level, u8 reset_type_sel, u8 sync_resp, bool sync_start) { @@ -44,3 +58,156 @@ int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev) { return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL0, 0, 0, false); } + +static int mlx5_fw_set_reset_sync_ack(struct mlx5_core_dev *dev) +{ + return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 1, false); +} + +static void mlx5_sync_reset_reload_work(struct work_struct *work) +{ + struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset, + reset_reload_work); + struct mlx5_core_dev *dev = fw_reset->dev; + + mlx5_enter_error_state(dev, true); + mlx5_unload_one(dev, false); + if (mlx5_health_wait_pci_up(dev)) { + mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n"); + return; + } + mlx5_load_one(dev, false); +} + +static void mlx5_stop_sync_reset_poll(struct mlx5_core_dev *dev) +{ + struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; + + del_timer(&fw_reset->timer); +} + +static void mlx5_sync_reset_clear_reset_requested(struct mlx5_core_dev *dev, bool poll_health) +{ + struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; + + mlx5_stop_sync_reset_poll(dev); + clear_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags); + if (poll_health) + mlx5_start_health_poll(dev); +} + +#define MLX5_RESET_POLL_INTERVAL (HZ / 10) +static void poll_sync_reset(struct timer_list *t) +{ + struct mlx5_fw_reset *fw_reset = from_timer(fw_reset, t, timer); + struct mlx5_core_dev *dev = fw_reset->dev; + u32 fatal_error; + + if (!test_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags)) + return; + + fatal_error = mlx5_health_check_fatal_sensors(dev); + + if (fatal_error) { + mlx5_core_warn(dev, "Got Device Reset\n"); + mlx5_sync_reset_clear_reset_requested(dev, false); + queue_work(fw_reset->wq, &fw_reset->reset_reload_work); + return; + } + + mod_timer(&fw_reset->timer, round_jiffies(jiffies + MLX5_RESET_POLL_INTERVAL)); +} + +static void mlx5_start_sync_reset_poll(struct mlx5_core_dev *dev) +{ + struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; + + timer_setup(&fw_reset->timer, poll_sync_reset, 0); + fw_reset->timer.expires = round_jiffies(jiffies + MLX5_RESET_POLL_INTERVAL); + add_timer(&fw_reset->timer); +} + +static void mlx5_sync_reset_set_reset_requested(struct mlx5_core_dev *dev) +{ + struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; + + mlx5_stop_health_poll(dev, true); + set_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags); + mlx5_start_sync_reset_poll(dev); +} + +static void mlx5_sync_reset_request_event(struct work_struct *work) +{ + struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset, + reset_request_work); + struct mlx5_core_dev *dev = fw_reset->dev; + + mlx5_sync_reset_set_reset_requested(dev); + if (mlx5_fw_set_reset_sync_ack(dev)) + mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack Failed.\n"); + else + mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack. Device reset is expected.\n"); +} + +static void mlx5_sync_reset_events_handle(struct mlx5_fw_reset *fw_reset, struct mlx5_eqe *eqe) +{ + struct mlx5_eqe_sync_fw_update *sync_fw_update_eqe; + u8 sync_event_rst_type; + + sync_fw_update_eqe = &eqe->data.sync_fw_update; + sync_event_rst_type = sync_fw_update_eqe->sync_rst_state & SYNC_RST_STATE_MASK; + switch (sync_event_rst_type) { + case MLX5_SYNC_RST_STATE_RESET_REQUEST: + queue_work(fw_reset->wq, &fw_reset->reset_request_work); + break; + } +} + +static int fw_reset_event_notifier(struct notifier_block *nb, unsigned long action, void *data) +{ + struct mlx5_fw_reset *fw_reset = mlx5_nb_cof(nb, struct mlx5_fw_reset, nb); + struct mlx5_eqe *eqe = data; + + switch (eqe->sub_type) { + case MLX5_GENERAL_SUBTYPE_PCI_SYNC_FOR_FW_UPDATE_EVENT: + mlx5_sync_reset_events_handle(fw_reset, eqe); + break; + default: + return NOTIFY_DONE; + } + + return NOTIFY_OK; +} + +int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev) +{ + struct mlx5_fw_reset *fw_reset = kzalloc(sizeof(*fw_reset), GFP_KERNEL); + + if (!fw_reset) + return -ENOMEM; + fw_reset->wq = create_singlethread_workqueue("mlx5_fw_reset_events"); + if (!fw_reset->wq) { + kfree(fw_reset); + return -ENOMEM; + } + + fw_reset->dev = dev; + dev->priv.fw_reset = fw_reset; + + INIT_WORK(&fw_reset->reset_request_work, mlx5_sync_reset_request_event); + INIT_WORK(&fw_reset->reset_reload_work, mlx5_sync_reset_reload_work); + + MLX5_NB_INIT(&fw_reset->nb, fw_reset_event_notifier, GENERAL_EVENT); + mlx5_eq_notifier_register(dev, &fw_reset->nb); + + return 0; +} + +void mlx5_fw_reset_events_cleanup(struct mlx5_core_dev *dev) +{ + struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; + + mlx5_eq_notifier_unregister(dev, &fw_reset->nb); + destroy_workqueue(fw_reset->wq); + kvfree(dev->priv.fw_reset); +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h index 1bbd95182ca6..278f538ea92a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h @@ -10,4 +10,7 @@ int mlx5_reg_mfrl_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_ty int mlx5_fw_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel); int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev); +int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev); +void mlx5_fw_reset_events_cleanup(struct mlx5_core_dev *dev); + #endif diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index b31f769d2df9..54523bed16cd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -110,7 +110,7 @@ static bool sensor_fw_synd_rfr(struct mlx5_core_dev *dev) return rfr && synd; } -static u32 check_fatal_sensors(struct mlx5_core_dev *dev) +u32 mlx5_health_check_fatal_sensors(struct mlx5_core_dev *dev) { if (sensor_pci_not_working(dev)) return MLX5_SENSOR_PCI_COMM_ERR; @@ -173,7 +173,7 @@ static bool reset_fw_if_needed(struct mlx5_core_dev *dev) * Check again to avoid a redundant 2nd reset. If the fatal erros was * PCI related a reset won't help. */ - fatal_error = check_fatal_sensors(dev); + fatal_error = mlx5_health_check_fatal_sensors(dev); if (fatal_error == MLX5_SENSOR_PCI_COMM_ERR || fatal_error == MLX5_SENSOR_NIC_DISABLED || fatal_error == MLX5_SENSOR_NIC_SW_RESET) { @@ -195,7 +195,7 @@ void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force) bool err_detected = false; /* Mark the device as fatal in order to abort FW commands */ - if ((check_fatal_sensors(dev) || force) && + if ((mlx5_health_check_fatal_sensors(dev) || force) && dev->state == MLX5_DEVICE_STATE_UP) { dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR; err_detected = true; @@ -208,7 +208,7 @@ void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force) goto unlock; } - if (check_fatal_sensors(dev) || force) { /* protected state setting */ + if (mlx5_health_check_fatal_sensors(dev) || force) { /* protected state setting */ dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR; mlx5_cmd_flush(dev); } @@ -231,7 +231,7 @@ void mlx5_error_sw_reset(struct mlx5_core_dev *dev) mlx5_core_err(dev, "start\n"); - if (check_fatal_sensors(dev) == MLX5_SENSOR_FW_SYND_RFR) { + if (mlx5_health_check_fatal_sensors(dev) == MLX5_SENSOR_FW_SYND_RFR) { /* Get cr-dump and reset FW semaphore */ lock = lock_sem_sw_reset(dev, true); @@ -308,26 +308,31 @@ static void mlx5_handle_bad_state(struct mlx5_core_dev *dev) /* How much time to wait until health resetting the driver (in msecs) */ #define MLX5_RECOVERY_WAIT_MSECS 60000 -static int mlx5_health_try_recover(struct mlx5_core_dev *dev) +int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev) { unsigned long end; - mlx5_core_warn(dev, "handling bad device here\n"); - mlx5_handle_bad_state(dev); end = jiffies + msecs_to_jiffies(MLX5_RECOVERY_WAIT_MSECS); while (sensor_pci_not_working(dev)) { - if (time_after(jiffies, end)) { - mlx5_core_err(dev, - "health recovery flow aborted, PCI reads still not working\n"); - return -EIO; - } + if (time_after(jiffies, end)) + return -ETIMEDOUT; msleep(100); } + return 0; +} +static int mlx5_health_try_recover(struct mlx5_core_dev *dev) +{ + mlx5_core_warn(dev, "handling bad device here\n"); + mlx5_handle_bad_state(dev); + if (mlx5_health_wait_pci_up(dev)) { + mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n"); + return -EIO; + } mlx5_core_err(dev, "starting health recovery flow\n"); mlx5_recover_device(dev); if (!test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state) || - check_fatal_sensors(dev)) { + mlx5_health_check_fatal_sensors(dev)) { mlx5_core_err(dev, "health recovery failed\n"); return -EIO; } @@ -696,7 +701,7 @@ static void poll_health(struct timer_list *t) if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) goto out; - fatal_error = check_fatal_sensors(dev); + fatal_error = mlx5_health_check_fatal_sensors(dev); if (fatal_error && !health->fatal_error) { mlx5_core_err(dev, "Fatal error %u detected\n", fatal_error); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 871d28b09f8a..e833db424f11 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -57,6 +57,7 @@ #include "lib/mpfs.h" #include "eswitch.h" #include "devlink.h" +#include "fw_reset.h" #include "lib/mlx5.h" #include "fpga/core.h" #include "fpga/ipsec.h" @@ -835,6 +836,12 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) goto err_eq_cleanup; } + err = mlx5_fw_reset_events_init(dev); + if (err) { + mlx5_core_err(dev, "failed to initialize fw reset events\n"); + goto err_events_cleanup; + } + mlx5_cq_debugfs_init(dev); mlx5_init_reserved_gids(dev); @@ -896,6 +903,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) mlx5_geneve_destroy(dev->geneve); mlx5_vxlan_destroy(dev->vxlan); mlx5_cq_debugfs_cleanup(dev); + mlx5_fw_reset_events_cleanup(dev); +err_events_cleanup: mlx5_events_cleanup(dev); err_eq_cleanup: mlx5_eq_table_cleanup(dev); @@ -923,6 +932,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev) mlx5_cleanup_clock(dev); mlx5_cleanup_reserved_gids(dev); mlx5_cq_debugfs_cleanup(dev); + mlx5_fw_reset_events_cleanup(dev); mlx5_events_cleanup(dev); mlx5_eq_table_cleanup(dev); mlx5_irq_table_cleanup(dev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index fc1649dac11b..d07a32165792 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -123,6 +123,8 @@ int mlx5_cmd_force_teardown_hca(struct mlx5_core_dev *dev); int mlx5_cmd_fast_teardown_hca(struct mlx5_core_dev *dev); void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force); void mlx5_error_sw_reset(struct mlx5_core_dev *dev); +u32 mlx5_health_check_fatal_sensors(struct mlx5_core_dev *dev); +int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev); void mlx5_disable_device(struct mlx5_core_dev *dev); void mlx5_recover_device(struct mlx5_core_dev *dev); int mlx5_sriov_init(struct mlx5_core_dev *dev); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 8dc3da6e6480..80e31a7684e0 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -501,6 +501,7 @@ struct mlx5_mpfs; struct mlx5_eswitch; struct mlx5_lag; struct mlx5_devcom; +struct mlx5_fw_reset; struct mlx5_eq_table; struct mlx5_irq_table; @@ -578,6 +579,7 @@ struct mlx5_priv { struct mlx5_core_sriov sriov; struct mlx5_lag *lag; struct mlx5_devcom *devcom; + struct mlx5_fw_reset *fw_reset; struct mlx5_core_roce roce; struct mlx5_fc_stats fc_stats; struct mlx5_rl_table rl_table; @@ -943,6 +945,8 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev); void mlx5_stop_health_poll(struct mlx5_core_dev *dev, bool disable_health); void mlx5_drain_health_wq(struct mlx5_core_dev *dev); void mlx5_trigger_health_work(struct mlx5_core_dev *dev); +void mlx5_health_set_reset_requested_mode(struct mlx5_core_dev *dev); +void mlx5_health_clear_reset_requested_mode(struct mlx5_core_dev *dev); int mlx5_buf_alloc(struct mlx5_core_dev *dev, int size, struct mlx5_frag_buf *buf); void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf); From patchwork Mon Aug 17 09:37:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Moshe Shemesh X-Patchwork-Id: 262425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DA0AC433DF for ; Mon, 17 Aug 2020 09:38:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8265E2078D for ; Mon, 17 Aug 2020 09:38:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728592AbgHQJiW (ORCPT ); Mon, 17 Aug 2020 05:38:22 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:55847 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728421AbgHQJiR (ORCPT ); Mon, 17 Aug 2020 05:38:17 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from moshe@mellanox.com) with SMTP; 17 Aug 2020 12:38:13 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (dev-l-vrt-135.mtl.labs.mlnx [10.234.135.1]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 07H9cDWL011426; Mon, 17 Aug 2020 12:38:13 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (localhost [127.0.0.1]) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Debian-10) with ESMTP id 07H9cD4g003244; Mon, 17 Aug 2020 12:38:13 +0300 Received: (from moshe@localhost) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Submit) id 07H9cDDa003243; Mon, 17 Aug 2020 12:38:13 +0300 From: Moshe Shemesh To: "David S. Miller" , Jakub Kicinski , Jiri Pirko Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Moshe Shemesh Subject: [PATCH net-next RFC v2 09/13] devlink: Add enable_remote_dev_reset generic parameter Date: Mon, 17 Aug 2020 12:37:48 +0300 Message-Id: <1597657072-3130-10-git-send-email-moshe@mellanox.com> X-Mailer: git-send-email 1.8.4.3 In-Reply-To: <1597657072-3130-1-git-send-email-moshe@mellanox.com> References: <1597657072-3130-1-git-send-email-moshe@mellanox.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The enable_remote_dev_reset devlink param flags that the host admin allows device resets that can be initiated by other hosts. This parameter is useful for setups where a device is shared by different hosts, such as multi-host setup. Once the user set this parameter to false, the driver should NACK any attempt to reset the device while the driver is loaded. Signed-off-by: Moshe Shemesh --- Documentation/networking/devlink/devlink-params.rst | 6 ++++++ include/net/devlink.h | 4 ++++ net/core/devlink.c | 5 +++++ 3 files changed, 15 insertions(+) diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst index d075fd090b3d..54c9f107c4b0 100644 --- a/Documentation/networking/devlink/devlink-params.rst +++ b/Documentation/networking/devlink/devlink-params.rst @@ -108,3 +108,9 @@ own name. * - ``region_snapshot_enable`` - Boolean - Enable capture of ``devlink-region`` snapshots. + * - ``enable_remote_dev_reset`` + - Boolean + - Enable device reset by remote host. When cleared, the device driver + will NACK any attempt of other host to reset the device. This parameter + is useful for setups where a device is shared by different hosts, such + as multi-host setup. diff --git a/include/net/devlink.h b/include/net/devlink.h index cad3e11d0b9b..0818e9c864eb 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -420,6 +420,7 @@ enum devlink_param_generic_id { DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY, DEVLINK_PARAM_GENERIC_ID_RESET_DEV_ON_DRV_PROBE, DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE, + DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET, /* add new param generic ids above here*/ __DEVLINK_PARAM_GENERIC_ID_MAX, @@ -457,6 +458,9 @@ enum devlink_param_generic_id { #define DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME "enable_roce" #define DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE DEVLINK_PARAM_TYPE_BOOL +#define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME "enable_remote_dev_reset" +#define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE DEVLINK_PARAM_TYPE_BOOL + #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \ { \ .id = DEVLINK_PARAM_GENERIC_ID_##_id, \ diff --git a/net/core/devlink.c b/net/core/devlink.c index 6bab1b02ca99..43b1839b8305 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -3226,6 +3226,11 @@ static const struct devlink_param devlink_param_generic[] = { .name = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME, .type = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE, }, + { + .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET, + .name = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME, + .type = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE, + }, }; static int devlink_param_generic_verify(const struct devlink_param *param) From patchwork Mon Aug 17 09:37:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Moshe Shemesh X-Patchwork-Id: 262420 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E262C433DF for ; Mon, 17 Aug 2020 09:39:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4E12220758 for ; Mon, 17 Aug 2020 09:39:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727093AbgHQJjr (ORCPT ); Mon, 17 Aug 2020 05:39:47 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:55849 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728503AbgHQJiR (ORCPT ); Mon, 17 Aug 2020 05:38:17 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from moshe@mellanox.com) with SMTP; 17 Aug 2020 12:38:13 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (dev-l-vrt-135.mtl.labs.mlnx [10.234.135.1]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 07H9cDon011430; Mon, 17 Aug 2020 12:38:13 +0300 Received: from dev-l-vrt-135.mtl.labs.mlnx (localhost [127.0.0.1]) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Debian-10) with ESMTP id 07H9cDhu003246; Mon, 17 Aug 2020 12:38:13 +0300 Received: (from moshe@localhost) by dev-l-vrt-135.mtl.labs.mlnx (8.15.2/8.15.2/Submit) id 07H9cDQA003245; Mon, 17 Aug 2020 12:38:13 +0300 From: Moshe Shemesh To: "David S. Miller" , Jakub Kicinski , Jiri Pirko Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Moshe Shemesh Subject: [PATCH net-next RFC v2 10/13] net/mlx5: Add devlink param enable_remote_dev_reset support Date: Mon, 17 Aug 2020 12:37:49 +0300 Message-Id: <1597657072-3130-11-git-send-email-moshe@mellanox.com> X-Mailer: git-send-email 1.8.4.3 In-Reply-To: <1597657072-3130-1-git-send-email-moshe@mellanox.com> References: <1597657072-3130-1-git-send-email-moshe@mellanox.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The enable_remote_dev_reset devlink param flags that the host admin allows resets by other hosts. In case it is cleared mlx5 host PF driver will send NACK on pci sync for firmware update reset request and the command will fail. By default enable_remote_dev_reset parameter is true, so pci sync for firmware update reset is enabled. Signed-off-by: Moshe Shemesh --- v1 -> v2: - Have MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST instead of MLX5_HEALTH_RESET_FLAGS_NACK_RESET_REQUEST --- .../net/ethernet/mellanox/mlx5/core/devlink.c | 21 +++++++++++++ .../ethernet/mellanox/mlx5/core/fw_reset.c | 30 +++++++++++++++++++ .../ethernet/mellanox/mlx5/core/fw_reset.h | 2 ++ 3 files changed, 53 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index 0a62c98f8c98..d975f5bd7394 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -275,6 +275,24 @@ static int mlx5_devlink_large_group_num_validate(struct devlink *devlink, u32 id } #endif +static int mlx5_devlink_enable_remote_dev_reset_set(struct devlink *devlink, u32 id, + struct devlink_param_gset_ctx *ctx) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + + mlx5_fw_enable_remote_dev_reset_set(dev, ctx->val.vbool); + return 0; +} + +static int mlx5_devlink_enable_remote_dev_reset_get(struct devlink *devlink, u32 id, + struct devlink_param_gset_ctx *ctx) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + + ctx->val.vbool = mlx5_fw_enable_remote_dev_reset_get(dev); + return 0; +} + static const struct devlink_param mlx5_devlink_params[] = { DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_FLOW_STEERING_MODE, "flow_steering_mode", DEVLINK_PARAM_TYPE_STRING, @@ -290,6 +308,9 @@ static const struct devlink_param mlx5_devlink_params[] = { NULL, NULL, mlx5_devlink_large_group_num_validate), #endif + DEVLINK_PARAM_GENERIC(ENABLE_REMOTE_DEV_RESET, BIT(DEVLINK_PARAM_CMODE_RUNTIME), + mlx5_devlink_enable_remote_dev_reset_get, + mlx5_devlink_enable_remote_dev_reset_set, NULL), }; static void mlx5_devlink_set_params_init_values(struct devlink *devlink) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c index 44fed2f1911c..f9d6310d99d6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c @@ -5,6 +5,7 @@ enum { MLX5_FW_RESET_FLAGS_RESET_REQUESTED, + MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, MLX5_FW_RESET_FLAGS_PENDING_COMP }; @@ -22,6 +23,23 @@ struct mlx5_fw_reset { int ret; }; +void mlx5_fw_enable_remote_dev_reset_set(struct mlx5_core_dev *dev, bool enable) +{ + struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; + + if (enable) + clear_bit(MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, &fw_reset->reset_flags); + else + set_bit(MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, &fw_reset->reset_flags); +} + +bool mlx5_fw_enable_remote_dev_reset_get(struct mlx5_core_dev *dev) +{ + struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; + + return !test_bit(MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, &fw_reset->reset_flags); +} + static int mlx5_reg_mfrl_set(struct mlx5_core_dev *dev, u8 reset_level, u8 reset_type_sel, u8 sync_resp, bool sync_start) { @@ -76,6 +94,11 @@ static int mlx5_fw_set_reset_sync_ack(struct mlx5_core_dev *dev) return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 1, false); } +static int mlx5_fw_set_reset_sync_nack(struct mlx5_core_dev *dev) +{ + return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 2, false); +} + static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev) { struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; @@ -164,7 +187,14 @@ static void mlx5_sync_reset_request_event(struct work_struct *work) struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset, reset_request_work); struct mlx5_core_dev *dev = fw_reset->dev; + int err; + if (test_bit(MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, &fw_reset->reset_flags)) { + err = mlx5_fw_set_reset_sync_nack(dev); + mlx5_core_warn(dev, "PCI Sync FW Update Reset Nack %s", + err ? "Failed" : "Sent"); + return; + } mlx5_sync_reset_set_reset_requested(dev); if (mlx5_fw_set_reset_sync_ack(dev)) mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack Failed.\n"); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h index d7ee951a2258..fd558dfe93fc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h @@ -6,6 +6,8 @@ #include "mlx5_core.h" +void mlx5_fw_enable_remote_dev_reset_set(struct mlx5_core_dev *dev, bool enable); +bool mlx5_fw_enable_remote_dev_reset_get(struct mlx5_core_dev *dev); int mlx5_reg_mfrl_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_type); int mlx5_fw_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel); int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev);