From patchwork Thu Jun 13 09:12:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Huazhong Tan X-Patchwork-Id: 166702 Delivered-To: patch@linaro.org Received: by 2002:ac9:2a84:0:0:0:0:0 with SMTP id p4csp838005oca; Thu, 13 Jun 2019 08:50:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqzVisLbvtVF4p1LByv2/n3k4RcW28bKfsSXHby6R2vXc+p4HQH4IqDaCKRxCiozESh26RsJ X-Received: by 2002:a17:90a:7f91:: with SMTP id m17mr6317802pjl.86.1560441023321; Thu, 13 Jun 2019 08:50:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560441023; cv=none; d=google.com; s=arc-20160816; b=lvZ4TIX1OPlUMJUufEfLJ82Ng96Z6tTYw+HvJCqQf1F2chUeyrSRNzXj1IL4rjpcD8 PkyK6wDEiduC+jJwwvj/LxRkqEK95NDikfxMrmds0c+jdvLROpJlmYfeA8UCX2na5dtc o5CeO14MDbci51Zjs4DadQrfQESRkev67XAV8qOeXPey8VmG2QZWAWb3oEOfsqohnO3u A8S63fLm/2c06Qnj6d7KmK+eFOU7UGhRPwjmh9XN7uO8vBNyYPw7wIeoUpT6KErn7bcj 0/7XgRhZeR1fwf7mzj6Q8Ut9f6piZOc5mOSWfNxsIm7xYDp2YoBYw+g/FKCAcwxeuvs8 GVSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=+YXwc+MfMhmrkCS6FgfYu3r75kamMY4qTyZML6OV5Ls=; b=bbOuDnRr8K8MlAdPG0NrXXWlMWd6LBHulAhHPzQYVCFjNUsSBPrBHlBHPO/yiQwXXz 5/5iipI4+AT/AwiUugIV5Z8oo1OaRCMCTv+glrLNBC+oos4prG+7k2uSdw/Rb8sSkAvA OnicJd0FBuytrRSyjmiSXcqKT21UsOlymav7OUKa2C+5+8vxQj9+6e5stdAJRHkoa9Ht +zgDel+trJQWv777wHM9r5Hd3VAp+H6DvhxRxwdXeG7oYSWMBAtBkk8L27cGzq64fPXK RI7QsEW+1ffzUPqkyE4hfdvGP0WB6Ed/l5JynYXfYENgHo56KMkEa7OqBJTqPxSfZxlV JZpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p16si121711pgd.370.2019.06.13.08.50.22; Thu, 13 Jun 2019 08:50:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389822AbfFMPuO (ORCPT + 24 others); Thu, 13 Jun 2019 11:50:14 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:18149 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731693AbfFMJOS (ORCPT ); Thu, 13 Jun 2019 05:14:18 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 4BBF0C634AE669F01C9C; Thu, 13 Jun 2019 17:14:14 +0800 (CST) Received: from localhost.localdomain (10.67.212.132) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.439.0; Thu, 13 Jun 2019 17:14:06 +0800 From: Huazhong Tan To: CC: , , , , , Shiju Jose , Weihang Li , Peng Li , Huazhong Tan Subject: [PATCH net-next 01/12] net: hns3: delay setting of reset level for hw errors until slot_reset is called Date: Thu, 13 Jun 2019 17:12:21 +0800 Message-ID: <1560417152-53050-2-git-send-email-tanhuazhong@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560417152-53050-1-git-send-email-tanhuazhong@huawei.com> References: <1560417152-53050-1-git-send-email-tanhuazhong@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.212.132] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shiju Jose Presently the error handling code sets the reset level required for the recovery of the hw errors to the reset framework in the error_detected AER callback. However the rest_event would be called later from the slot_reset callback. This can cause issue of using the wrong reset_level if a high priority reset request occur before the slot_reset is called. This patch delays setting of the reset level, required for the hw errors, to the reset framework until the slot_reset is called. Reported-by: Salil Mehta Signed-off-by: Shiju Jose Signed-off-by: Weihang Li Signed-off-by: Peng Li Signed-off-by: Huazhong Tan --- drivers/net/ethernet/hisilicon/hns3/hnae3.h | 3 ++ drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 15 ++++-- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 61 ++++++++++------------ .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 14 +++-- 4 files changed, 51 insertions(+), 42 deletions(-) -- 2.7.4 diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h index 63cdc18..79044b5 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h @@ -214,6 +214,7 @@ struct hnae3_ae_dev { struct list_head node; u32 flag; u8 override_pci_need_reset; /* fix to stop multiple reset happening */ + unsigned long hw_err_reset_req; enum hnae3_reset_type reset_type; void *priv; }; @@ -459,6 +460,8 @@ struct hnae3_ae_ops { u16 vlan, u8 qos, __be16 proto); int (*enable_hw_strip_rxvtag)(struct hnae3_handle *handle, bool enable); void (*reset_event)(struct pci_dev *pdev, struct hnae3_handle *handle); + enum hnae3_reset_type (*get_reset_level)(struct hnae3_ae_dev *ae_dev, + unsigned long *addr); void (*set_default_reset_request)(struct hnae3_ae_dev *ae_dev, enum hnae3_reset_type rst_type); void (*get_channels)(struct hnae3_handle *handle, diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index fe2c2c5..66d733b 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -1930,17 +1930,22 @@ static pci_ers_result_t hns3_error_detected(struct pci_dev *pdev, static pci_ers_result_t hns3_slot_reset(struct pci_dev *pdev) { struct hnae3_ae_dev *ae_dev = pci_get_drvdata(pdev); + const struct hnae3_ae_ops *ops = ae_dev->ops; + enum hnae3_reset_type reset_type; struct device *dev = &pdev->dev; - dev_info(dev, "requesting reset due to PCI error\n"); - if (!ae_dev || !ae_dev->ops) return PCI_ERS_RESULT_NONE; /* request the reset */ - if (ae_dev->ops->reset_event) { - if (!ae_dev->override_pci_need_reset) - ae_dev->ops->reset_event(pdev, NULL); + if (ops->reset_event) { + if (!ae_dev->override_pci_need_reset) { + reset_type = ops->get_reset_level(ae_dev, + &ae_dev->hw_err_reset_req); + ops->set_default_reset_request(ae_dev, reset_type); + dev_info(dev, "requesting reset due to PCI error\n"); + ops->reset_event(pdev, NULL); + } return PCI_ERS_RESULT_RECOVERED; } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c index 4126287..1a2ea1b 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c @@ -1069,13 +1069,6 @@ static int hclge_config_ssu_hw_err_int(struct hclge_dev *hdev, bool en) return ret; } -#define HCLGE_SET_DEFAULT_RESET_REQUEST(reset_type) \ - do { \ - if (ae_dev->ops->set_default_reset_request) \ - ae_dev->ops->set_default_reset_request(ae_dev, \ - reset_type); \ - } while (0) - /* hclge_handle_mpf_ras_error: handle all main PF RAS errors * @hdev: pointer to struct hclge_dev * @desc: descriptor for describing the command @@ -1110,7 +1103,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "IMP_TCM_ECC_INT_STS", &hclge_imp_tcm_ecc_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } status = le32_to_cpu(desc[0].data[1]); @@ -1118,20 +1111,18 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "CMDQ_MEM_ECC_INT_STS", &hclge_cmdq_nic_mem_ecc_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } - if ((le32_to_cpu(desc[0].data[2])) & BIT(0)) { + if ((le32_to_cpu(desc[0].data[2])) & BIT(0)) dev_warn(dev, "imp_rd_data_poison_err found\n"); - HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_NONE_RESET); - } status = le32_to_cpu(desc[0].data[3]); if (status) { reset_level = hclge_log_error(dev, "TQP_INT_ECC_INT_STS", &hclge_tqp_int_ecc_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } status = le32_to_cpu(desc[0].data[4]); @@ -1139,7 +1130,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "MSIX_ECC_INT_STS", &hclge_msix_sram_ecc_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* log SSU(Storage Switch Unit) errors */ @@ -1149,14 +1140,14 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "SSU_ECC_MULTI_BIT_INT_0", &hclge_ssu_mem_ecc_err_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } status = le32_to_cpu(*(desc_data + 3)) & BIT(0); if (status) { dev_warn(dev, "SSU_ECC_MULTI_BIT_INT_1 ssu_mem32_ecc_mbit_err found [error status=0x%x]\n", status); - HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET); + set_bit(HNAE3_GLOBAL_RESET, &ae_dev->hw_err_reset_req); } status = le32_to_cpu(*(desc_data + 4)) & HCLGE_SSU_COMMON_ERR_INT_MASK; @@ -1164,7 +1155,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "SSU_COMMON_ERR_INT", &hclge_ssu_com_err_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* log IGU(Ingress Unit) errors */ @@ -1173,7 +1164,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, if (status) { reset_level = hclge_log_error(dev, "IGU_INT_STS", &hclge_igu_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* log PPP(Programmable Packet Process) errors */ @@ -1184,7 +1175,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, hclge_log_error(dev, "PPP_MPF_ABNORMAL_INT_ST1", &hclge_ppp_mpf_abnormal_int_st1[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } status = le32_to_cpu(*(desc_data + 3)) & HCLGE_PPP_MPF_INT_ST3_MASK; @@ -1193,7 +1184,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, hclge_log_error(dev, "PPP_MPF_ABNORMAL_INT_ST3", &hclge_ppp_mpf_abnormal_int_st3[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* log PPU(RCB) errors */ @@ -1202,7 +1193,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, if (status) { dev_warn(dev, "PPU_MPF_ABNORMAL_INT_ST1 %s found\n", "rpu_rx_pkt_ecc_mbit_err"); - HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET); + set_bit(HNAE3_GLOBAL_RESET, &ae_dev->hw_err_reset_req); } status = le32_to_cpu(*(desc_data + 2)); @@ -1211,7 +1202,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST2", &hclge_ppu_mpf_abnormal_int_st2[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } status = le32_to_cpu(*(desc_data + 3)) & HCLGE_PPU_MPF_INT_ST3_MASK; @@ -1220,7 +1211,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST3", &hclge_ppu_mpf_abnormal_int_st3[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* log TM(Traffic Manager) errors */ @@ -1229,7 +1220,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, if (status) { reset_level = hclge_log_error(dev, "TM_SCH_RINT", &hclge_tm_sch_rint[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* log QCN(Quantized Congestion Control) errors */ @@ -1238,7 +1229,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, if (status) { reset_level = hclge_log_error(dev, "QCN_FIFO_RINT", &hclge_qcn_fifo_rint[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } status = le32_to_cpu(*(desc_data + 1)) & HCLGE_QCN_ECC_INT_MASK; @@ -1246,7 +1237,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "QCN_ECC_RINT", &hclge_qcn_ecc_rint[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* log NCSI errors */ @@ -1255,7 +1246,7 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, if (status) { reset_level = hclge_log_error(dev, "NCSI_ECC_INT_RPT", &hclge_ncsi_err_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* clear all main PF RAS errors */ @@ -1301,7 +1292,7 @@ static int hclge_handle_pf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "SSU_PORT_BASED_ERR_INT", &hclge_ssu_port_based_err_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } status = le32_to_cpu(desc[0].data[1]); @@ -1309,7 +1300,7 @@ static int hclge_handle_pf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "SSU_FIFO_OVERFLOW_INT", &hclge_ssu_fifo_overflow_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } status = le32_to_cpu(desc[0].data[2]); @@ -1317,7 +1308,7 @@ static int hclge_handle_pf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "SSU_ETS_TCG_INT", &hclge_ssu_ets_tcg_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* log IGU(Ingress Unit) EGU(Egress Unit) TNL errors */ @@ -1327,7 +1318,7 @@ static int hclge_handle_pf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "IGU_EGU_TNL_INT_STS", &hclge_igu_egu_tnl_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* log PPU(RCB) errors */ @@ -1337,7 +1328,7 @@ static int hclge_handle_pf_ras_error(struct hclge_dev *hdev, reset_level = hclge_log_error(dev, "PPU_PF_ABNORMAL_INT_ST0", &hclge_ppu_pf_abnormal_int[0], status); - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level); + set_bit(reset_level, &ae_dev->hw_err_reset_req); } /* clear all PF RAS errors */ @@ -1597,7 +1588,7 @@ static void hclge_handle_rocee_ras_error(struct hnae3_ae_dev *ae_dev) reset_type = hclge_log_and_clear_rocee_ras_error(hdev); if (reset_type != HNAE3_NONE_RESET) - HCLGE_SET_DEFAULT_RESET_REQUEST(reset_type); + set_bit(reset_type, &ae_dev->hw_err_reset_req); } static const struct hclge_hw_blk hw_blk[] = { @@ -1657,6 +1648,10 @@ pci_ers_result_t hclge_handle_hw_ras_error(struct hnae3_ae_dev *ae_dev) status = hclge_read_dev(&hdev->hw, HCLGE_RAS_PF_OTHER_INT_STS_REG); + if (status & HCLGE_RAS_REG_NFE_MASK || + status & HCLGE_RAS_REG_ROCEE_ERR_MASK) + ae_dev->hw_err_reset_req = 0; + /* Handling Non-fatal HNS RAS errors */ if (status & HCLGE_RAS_REG_NFE_MASK) { dev_warn(dev, diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index b7ba893..f3e9030 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -41,6 +41,8 @@ static int hclge_set_umv_space(struct hclge_dev *hdev, u16 space_size, u16 *allocated_size, bool is_alloc); static void hclge_rfs_filter_expire(struct hclge_dev *hdev); static void hclge_clear_arfs_rules(struct hnae3_handle *handle); +static enum hnae3_reset_type hclge_get_reset_level(struct hnae3_ae_dev *ae_dev, + unsigned long *addr); static struct hnae3_ae_algo ae_algo; @@ -3066,10 +3068,11 @@ static void hclge_do_reset(struct hclge_dev *hdev) } } -static enum hnae3_reset_type hclge_get_reset_level(struct hclge_dev *hdev, +static enum hnae3_reset_type hclge_get_reset_level(struct hnae3_ae_dev *ae_dev, unsigned long *addr) { enum hnae3_reset_type rst_level = HNAE3_NONE_RESET; + struct hclge_dev *hdev = ae_dev->priv; /* first, resolve any unknown reset type to the known type(s) */ if (test_bit(HNAE3_UNKNOWN_RESET, addr)) { @@ -3398,7 +3401,7 @@ static void hclge_reset_event(struct pci_dev *pdev, struct hnae3_handle *handle) return; else if (hdev->default_reset_request) hdev->reset_level = - hclge_get_reset_level(hdev, + hclge_get_reset_level(ae_dev, &hdev->default_reset_request); else if (time_after(jiffies, (hdev->last_reset_time + 4 * 5 * HZ))) hdev->reset_level = HNAE3_FUNC_RESET; @@ -3434,6 +3437,8 @@ static void hclge_reset_timer(struct timer_list *t) static void hclge_reset_subtask(struct hclge_dev *hdev) { + struct hnae3_ae_dev *ae_dev = pci_get_drvdata(hdev->pdev); + /* check if there is any ongoing reset in the hardware. This status can * be checked from reset_pending. If there is then, we need to wait for * hardware to complete reset. @@ -3444,12 +3449,12 @@ static void hclge_reset_subtask(struct hclge_dev *hdev) * now. */ hdev->last_reset_time = jiffies; - hdev->reset_type = hclge_get_reset_level(hdev, &hdev->reset_pending); + hdev->reset_type = hclge_get_reset_level(ae_dev, &hdev->reset_pending); if (hdev->reset_type != HNAE3_NONE_RESET) hclge_reset(hdev); /* check if we got any *new* reset requests to be honored */ - hdev->reset_type = hclge_get_reset_level(hdev, &hdev->reset_request); + hdev->reset_type = hclge_get_reset_level(ae_dev, &hdev->reset_request); if (hdev->reset_type != HNAE3_NONE_RESET) hclge_do_reset(hdev); @@ -9231,6 +9236,7 @@ static const struct hnae3_ae_ops hclge_ops = { .set_vf_vlan_filter = hclge_set_vf_vlan_filter, .enable_hw_strip_rxvtag = hclge_en_hw_strip_rxvtag, .reset_event = hclge_reset_event, + .get_reset_level = hclge_get_reset_level, .set_default_reset_request = hclge_set_def_reset_request, .get_tqps_and_rss_info = hclge_get_tqps_and_rss_info, .set_channels = hclge_set_channels, From patchwork Thu Jun 13 09:12:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Huazhong Tan X-Patchwork-Id: 166703 Delivered-To: patch@linaro.org Received: by 2002:ac9:2a84:0:0:0:0:0 with SMTP id p4csp838196oca; Thu, 13 Jun 2019 08:50:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqxKVeadqsTZEHRrgn2DcKb2LgMjRjcBipqLLH99ksiZhmWoI7t1qbWTNQwxnOqjASs4SUoZ X-Received: by 2002:a63:68b:: with SMTP id 133mr29911489pgg.385.1560441035332; Thu, 13 Jun 2019 08:50:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560441035; cv=none; d=google.com; s=arc-20160816; b=Fq4Hr4nzCaPFbEAy7t8GYemIQhcGfIAuDIhWQ62WIj3VFqvKCHP8th9nDWiN2IoPbA mERuVd1pAzU9m57Zr/CtkpwWhUnI8JECehGV6629kVRJMWtjL9eEYb0/D6nr/DxTBZ65 9XgsXiYbkaqxubTEInwAp98z8O8FtpZT1fRw+63738Fnq8Zgm/DNCOHiNOTZhry9ia/8 zQmxynoen1WU9fdyGNks6B747YcKoDIFNpjcj2/Dpjqr+SxwgejVfcJm+q+FhuK6d89O vcqmNX74CdOmk4GuKv0l7QQ5CjV95UWJpUNHI4z/yFeLZcGQcOaizD88q+x7w8e0GnsB J8Rw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=h/G/3b8FFmTcTGyXwOOTMTVkGBNQDPfbGc1HWgFvJEI=; b=Yu7/XnqHART0nr2PC6lBOzR4aSa6Al/EI9UC92bXZ2Adhz48umz5XxgIqFEvxTXr35 +syoqxr8Pua7cPq5HjP8Flfvb/EN2rhUyXurTDdYUaRizakWiSevb5+QwBLx+y8fxm1f rwFjJfrUI9z4QOv+GvtnCpCfsyHoBR4k+SDE8Y015PnEK3iTGvC1z2sBxuEjv/PA8Tdj rWdXQW1+9icEs8cvhemXnLwulkVzSQJ7uPmY4FGiW2VI7inkft7hObN0O+ocGcGKuZeZ CNgdFoXRL7dV/lJs3POnD5qW59HUbSsf9kHeXWa1u0i2WZ7gd7nab2aXEZYRflTpIOzz GJaA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d16si3635343pfr.229.2019.06.13.08.50.35; Thu, 13 Jun 2019 08:50:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389776AbfFMPuM (ORCPT + 24 others); Thu, 13 Jun 2019 11:50:12 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:18147 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731689AbfFMJOT (ORCPT ); Thu, 13 Jun 2019 05:14:19 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 2FDF6C301C1D0A26ADCA; Thu, 13 Jun 2019 17:14:14 +0800 (CST) Received: from localhost.localdomain (10.67.212.132) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.439.0; Thu, 13 Jun 2019 17:14:06 +0800 From: Huazhong Tan To: CC: , , , , , Shiju Jose , Peng Li , "Huazhong Tan" Subject: [PATCH net-next 02/12] net: hns3: fix avoid unnecessary resetting for the H/W errors which do not require reset Date: Thu, 13 Jun 2019 17:12:22 +0800 Message-ID: <1560417152-53050-3-git-send-email-tanhuazhong@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560417152-53050-1-git-send-email-tanhuazhong@huawei.com> References: <1560417152-53050-1-git-send-email-tanhuazhong@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.212.132] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shiju Jose HNS does not need to be reset when errors occur in some bits. However presently the HNAE3_FUNC_RESET is set in this case and as a result the default_reset is done when these errors are reported. This patch fix this issue. Also patch does some optimization in setting the reset level for the error recovery. Reported-by: Weihang Li Signed-off-by: Shiju Jose Signed-off-by: Peng Li Signed-off-by: Huazhong Tan --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 280 ++++++++------------- 1 file changed, 109 insertions(+), 171 deletions(-) -- 2.7.4 diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c index 1a2ea1b..3ea305e 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c @@ -631,29 +631,20 @@ static const struct hclge_hw_error hclge_rocee_qmm_ovf_err_int[] = { { /* sentinel */ } }; -static enum hnae3_reset_type hclge_log_error(struct device *dev, char *reg, - const struct hclge_hw_error *err, - u32 err_sts) +static void hclge_log_error(struct device *dev, char *reg, + const struct hclge_hw_error *err, + u32 err_sts, unsigned long *reset_requests) { - enum hnae3_reset_type reset_level = HNAE3_FUNC_RESET; - bool need_reset = false; - while (err->msg) { if (err->int_msk & err_sts) { dev_warn(dev, "%s %s found [error status=0x%x]\n", reg, err->msg, err_sts); - if (err->reset_level != HNAE3_NONE_RESET && - err->reset_level >= reset_level) { - reset_level = err->reset_level; - need_reset = true; - } + if (err->reset_level && + err->reset_level != HNAE3_NONE_RESET) + set_bit(err->reset_level, reset_requests); } err++; } - if (need_reset) - return reset_level; - else - return HNAE3_NONE_RESET; } /* hclge_cmd_query_error: read the error information @@ -1082,7 +1073,6 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, int num) { struct hnae3_ae_dev *ae_dev = hdev->ae_dev; - enum hnae3_reset_type reset_level; struct device *dev = &hdev->pdev->dev; __le32 *desc_data; u32 status; @@ -1099,49 +1089,39 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, /* log HNS common errors */ status = le32_to_cpu(desc[0].data[0]); - if (status) { - reset_level = hclge_log_error(dev, "IMP_TCM_ECC_INT_STS", - &hclge_imp_tcm_ecc_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "IMP_TCM_ECC_INT_STS", + &hclge_imp_tcm_ecc_int[0], status, + &ae_dev->hw_err_reset_req); status = le32_to_cpu(desc[0].data[1]); - if (status) { - reset_level = hclge_log_error(dev, "CMDQ_MEM_ECC_INT_STS", - &hclge_cmdq_nic_mem_ecc_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "CMDQ_MEM_ECC_INT_STS", + &hclge_cmdq_nic_mem_ecc_int[0], status, + &ae_dev->hw_err_reset_req); if ((le32_to_cpu(desc[0].data[2])) & BIT(0)) dev_warn(dev, "imp_rd_data_poison_err found\n"); status = le32_to_cpu(desc[0].data[3]); - if (status) { - reset_level = hclge_log_error(dev, "TQP_INT_ECC_INT_STS", - &hclge_tqp_int_ecc_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "TQP_INT_ECC_INT_STS", + &hclge_tqp_int_ecc_int[0], status, + &ae_dev->hw_err_reset_req); status = le32_to_cpu(desc[0].data[4]); - if (status) { - reset_level = hclge_log_error(dev, "MSIX_ECC_INT_STS", - &hclge_msix_sram_ecc_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "MSIX_ECC_INT_STS", + &hclge_msix_sram_ecc_int[0], status, + &ae_dev->hw_err_reset_req); /* log SSU(Storage Switch Unit) errors */ desc_data = (__le32 *)&desc[2]; status = le32_to_cpu(*(desc_data + 2)); - if (status) { - reset_level = hclge_log_error(dev, "SSU_ECC_MULTI_BIT_INT_0", - &hclge_ssu_mem_ecc_err_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "SSU_ECC_MULTI_BIT_INT_0", + &hclge_ssu_mem_ecc_err_int[0], status, + &ae_dev->hw_err_reset_req); status = le32_to_cpu(*(desc_data + 3)) & BIT(0); if (status) { @@ -1151,41 +1131,32 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, } status = le32_to_cpu(*(desc_data + 4)) & HCLGE_SSU_COMMON_ERR_INT_MASK; - if (status) { - reset_level = hclge_log_error(dev, "SSU_COMMON_ERR_INT", - &hclge_ssu_com_err_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "SSU_COMMON_ERR_INT", + &hclge_ssu_com_err_int[0], status, + &ae_dev->hw_err_reset_req); /* log IGU(Ingress Unit) errors */ desc_data = (__le32 *)&desc[3]; status = le32_to_cpu(*desc_data) & HCLGE_IGU_INT_MASK; - if (status) { - reset_level = hclge_log_error(dev, "IGU_INT_STS", - &hclge_igu_int[0], status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "IGU_INT_STS", + &hclge_igu_int[0], status, + &ae_dev->hw_err_reset_req); /* log PPP(Programmable Packet Process) errors */ desc_data = (__le32 *)&desc[4]; status = le32_to_cpu(*(desc_data + 1)); - if (status) { - reset_level = - hclge_log_error(dev, "PPP_MPF_ABNORMAL_INT_ST1", - &hclge_ppp_mpf_abnormal_int_st1[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "PPP_MPF_ABNORMAL_INT_ST1", + &hclge_ppp_mpf_abnormal_int_st1[0], status, + &ae_dev->hw_err_reset_req); status = le32_to_cpu(*(desc_data + 3)) & HCLGE_PPP_MPF_INT_ST3_MASK; - if (status) { - reset_level = - hclge_log_error(dev, "PPP_MPF_ABNORMAL_INT_ST3", - &hclge_ppp_mpf_abnormal_int_st3[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "PPP_MPF_ABNORMAL_INT_ST3", + &hclge_ppp_mpf_abnormal_int_st3[0], status, + &ae_dev->hw_err_reset_req); /* log PPU(RCB) errors */ desc_data = (__le32 *)&desc[5]; @@ -1197,57 +1168,46 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, } status = le32_to_cpu(*(desc_data + 2)); - if (status) { - reset_level = - hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST2", - &hclge_ppu_mpf_abnormal_int_st2[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST2", + &hclge_ppu_mpf_abnormal_int_st2[0], status, + &ae_dev->hw_err_reset_req); status = le32_to_cpu(*(desc_data + 3)) & HCLGE_PPU_MPF_INT_ST3_MASK; - if (status) { - reset_level = - hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST3", - &hclge_ppu_mpf_abnormal_int_st3[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST3", + &hclge_ppu_mpf_abnormal_int_st3[0], status, + &ae_dev->hw_err_reset_req); /* log TM(Traffic Manager) errors */ desc_data = (__le32 *)&desc[6]; status = le32_to_cpu(*desc_data); - if (status) { - reset_level = hclge_log_error(dev, "TM_SCH_RINT", - &hclge_tm_sch_rint[0], status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "TM_SCH_RINT", + &hclge_tm_sch_rint[0], status, + &ae_dev->hw_err_reset_req); /* log QCN(Quantized Congestion Control) errors */ desc_data = (__le32 *)&desc[7]; status = le32_to_cpu(*desc_data) & HCLGE_QCN_FIFO_INT_MASK; - if (status) { - reset_level = hclge_log_error(dev, "QCN_FIFO_RINT", - &hclge_qcn_fifo_rint[0], status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "QCN_FIFO_RINT", + &hclge_qcn_fifo_rint[0], status, + &ae_dev->hw_err_reset_req); status = le32_to_cpu(*(desc_data + 1)) & HCLGE_QCN_ECC_INT_MASK; - if (status) { - reset_level = hclge_log_error(dev, "QCN_ECC_RINT", - &hclge_qcn_ecc_rint[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "QCN_ECC_RINT", + &hclge_qcn_ecc_rint[0], status, + &ae_dev->hw_err_reset_req); /* log NCSI errors */ desc_data = (__le32 *)&desc[9]; status = le32_to_cpu(*desc_data) & HCLGE_NCSI_ECC_INT_MASK; - if (status) { - reset_level = hclge_log_error(dev, "NCSI_ECC_INT_RPT", - &hclge_ncsi_err_int[0], status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "NCSI_ECC_INT_RPT", + &hclge_ncsi_err_int[0], status, + &ae_dev->hw_err_reset_req); /* clear all main PF RAS errors */ hclge_cmd_reuse_desc(&desc[0], false); @@ -1272,7 +1232,6 @@ static int hclge_handle_pf_ras_error(struct hclge_dev *hdev, { struct hnae3_ae_dev *ae_dev = hdev->ae_dev; struct device *dev = &hdev->pdev->dev; - enum hnae3_reset_type reset_level; __le32 *desc_data; u32 status; int ret; @@ -1288,48 +1247,38 @@ static int hclge_handle_pf_ras_error(struct hclge_dev *hdev, /* log SSU(Storage Switch Unit) errors */ status = le32_to_cpu(desc[0].data[0]); - if (status) { - reset_level = hclge_log_error(dev, "SSU_PORT_BASED_ERR_INT", - &hclge_ssu_port_based_err_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "SSU_PORT_BASED_ERR_INT", + &hclge_ssu_port_based_err_int[0], status, + &ae_dev->hw_err_reset_req); status = le32_to_cpu(desc[0].data[1]); - if (status) { - reset_level = hclge_log_error(dev, "SSU_FIFO_OVERFLOW_INT", - &hclge_ssu_fifo_overflow_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "SSU_FIFO_OVERFLOW_INT", + &hclge_ssu_fifo_overflow_int[0], status, + &ae_dev->hw_err_reset_req); status = le32_to_cpu(desc[0].data[2]); - if (status) { - reset_level = hclge_log_error(dev, "SSU_ETS_TCG_INT", - &hclge_ssu_ets_tcg_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "SSU_ETS_TCG_INT", + &hclge_ssu_ets_tcg_int[0], status, + &ae_dev->hw_err_reset_req); /* log IGU(Ingress Unit) EGU(Egress Unit) TNL errors */ desc_data = (__le32 *)&desc[1]; status = le32_to_cpu(*desc_data) & HCLGE_IGU_EGU_TNL_INT_MASK; - if (status) { - reset_level = hclge_log_error(dev, "IGU_EGU_TNL_INT_STS", - &hclge_igu_egu_tnl_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "IGU_EGU_TNL_INT_STS", + &hclge_igu_egu_tnl_int[0], status, + &ae_dev->hw_err_reset_req); /* log PPU(RCB) errors */ desc_data = (__le32 *)&desc[3]; status = le32_to_cpu(*desc_data) & HCLGE_PPU_PF_INT_RAS_MASK; - if (status) { - reset_level = hclge_log_error(dev, "PPU_PF_ABNORMAL_INT_ST0", - &hclge_ppu_pf_abnormal_int[0], - status); - set_bit(reset_level, &ae_dev->hw_err_reset_req); - } + if (status) + hclge_log_error(dev, "PPU_PF_ABNORMAL_INT_ST0", + &hclge_ppu_pf_abnormal_int[0], status, + &ae_dev->hw_err_reset_req); /* clear all PF RAS errors */ hclge_cmd_reuse_desc(&desc[0], false); @@ -1671,8 +1620,9 @@ pci_ers_result_t hclge_handle_hw_ras_error(struct hnae3_ae_dev *ae_dev) hclge_handle_rocee_ras_error(ae_dev); } - if (status & HCLGE_RAS_REG_NFE_MASK || - status & HCLGE_RAS_REG_ROCEE_ERR_MASK) { + if ((status & HCLGE_RAS_REG_NFE_MASK || + status & HCLGE_RAS_REG_ROCEE_ERR_MASK) && + ae_dev->hw_err_reset_req) { ae_dev->override_pci_need_reset = 0; return PCI_ERS_RESULT_NEED_RESET; } @@ -1762,7 +1712,6 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, struct hclge_mac_tnl_stats mac_tnl_stats; struct device *dev = &hdev->pdev->dev; u32 mpf_bd_num, pf_bd_num, bd_num; - enum hnae3_reset_type reset_level; struct hclge_desc desc_bd; struct hclge_desc *desc; __le32 *desc_data; @@ -1800,24 +1749,19 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, /* log MAC errors */ desc_data = (__le32 *)&desc[1]; status = le32_to_cpu(*desc_data); - if (status) { - reset_level = hclge_log_error(dev, "MAC_AFIFO_TNL_INT_R", - &hclge_mac_afifo_tnl_int[0], - status); - set_bit(reset_level, reset_requests); - } + if (status) + hclge_log_error(dev, "MAC_AFIFO_TNL_INT_R", + &hclge_mac_afifo_tnl_int[0], status, + reset_requests); /* log PPU(RCB) MPF errors */ desc_data = (__le32 *)&desc[5]; status = le32_to_cpu(*(desc_data + 2)) & HCLGE_PPU_MPF_INT_ST2_MSIX_MASK; - if (status) { - reset_level = - hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST2", - &hclge_ppu_mpf_abnormal_int_st2[0], - status); - set_bit(reset_level, reset_requests); - } + if (status) + hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST2", + &hclge_ppu_mpf_abnormal_int_st2[0], + status, reset_requests); /* clear all main PF MSIx errors */ hclge_cmd_reuse_desc(&desc[0], false); @@ -1841,32 +1785,26 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, /* log SSU PF errors */ status = le32_to_cpu(desc[0].data[0]) & HCLGE_SSU_PORT_INT_MSIX_MASK; - if (status) { - reset_level = hclge_log_error(dev, "SSU_PORT_BASED_ERR_INT", - &hclge_ssu_port_based_pf_int[0], - status); - set_bit(reset_level, reset_requests); - } + if (status) + hclge_log_error(dev, "SSU_PORT_BASED_ERR_INT", + &hclge_ssu_port_based_pf_int[0], + status, reset_requests); /* read and log PPP PF errors */ desc_data = (__le32 *)&desc[2]; status = le32_to_cpu(*desc_data); - if (status) { - reset_level = hclge_log_error(dev, "PPP_PF_ABNORMAL_INT_ST0", - &hclge_ppp_pf_abnormal_int[0], - status); - set_bit(reset_level, reset_requests); - } + if (status) + hclge_log_error(dev, "PPP_PF_ABNORMAL_INT_ST0", + &hclge_ppp_pf_abnormal_int[0], + status, reset_requests); /* log PPU(RCB) PF errors */ desc_data = (__le32 *)&desc[3]; status = le32_to_cpu(*desc_data) & HCLGE_PPU_PF_INT_MSIX_MASK; - if (status) { - reset_level = hclge_log_error(dev, "PPU_PF_ABNORMAL_INT_ST", - &hclge_ppu_pf_abnormal_int[0], - status); - set_bit(reset_level, reset_requests); - } + if (status) + hclge_log_error(dev, "PPU_PF_ABNORMAL_INT_ST", + &hclge_ppu_pf_abnormal_int[0], + status, reset_requests); status = le32_to_cpu(*desc_data) & HCLGE_PPU_PF_OVER_8BD_ERR_MASK; if (status) From patchwork Thu Jun 13 09:12:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Huazhong Tan X-Patchwork-Id: 166701 Delivered-To: patch@linaro.org Received: by 2002:ac9:2a84:0:0:0:0:0 with SMTP id p4csp837677oca; Thu, 13 Jun 2019 08:50:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqxMxDYOoESp/33nuDLslORbQzAh1KpJbi54Gl/+wLugiXygAi6OHG9Av1O9FFru/BQ9eQAQ X-Received: by 2002:a63:6fce:: with SMTP id k197mr31559095pgc.140.1560441003025; Thu, 13 Jun 2019 08:50:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560441003; cv=none; d=google.com; s=arc-20160816; b=Lgw+0KzymA5NSfe5ZkQ5Q2aJy6wPw15jNd5bYZ0/mR1P8nEfYEhS78KHjobthNORxf BLNvPHDTfCnAop2Gdf1PmwjnaqfdqueGvegLsMuK+dYQXGOa6H3Md9dr5MbKScj1WsdO KlxHP+sVOqkR3lTNGFSwqO2uNA6sPCl8P92isOkY4CVqilN55cNcWeP1t8Sr2wRG14d8 PMsH01mPZfTdEl6HfWLnXXE2ICJxDTBysx8HoKx+LsIW1J2ZS5ikpq/4Gmm7R1Jy9R6l m/e96SEQqoTcerofn2MElBBzl8xEijXfR6ilrJ571ts4Nw89QoFJhyoZbBgACO0sQTCg 4XQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=4MDkBNJqcn/eaTTv6TkdvIoSvY2LHPKpVpASrAMa3A0=; b=FXwfdAKGaRrL95SsQfj219RPZ66G4pnaVSKRtKmw3EfcoRdseV7HnbbF/oBEz8Wcwr NH0/zVNaL5eKRsUQA+tM1rEDUMKccydCU5/7sCcyQCTnvcX2DQ5T/+zIKIz57sNpCcQH sDokEb2wxZl0ahiSyTf3/3ghdQVarbUSbi8N/3ce40rGG9bwz3lT6vVBNYhS7Qbl9LKN +OnLRnIG6eyZqDdB7ADmkNt7mzcnzMDBdmaPEEWSy+jH/T7IbWHBetTGqsMbAtz924n7 tw0gkEX8y9Y3fe38SqkJoo8fv8rOXWt06dI3QnmA/7lJ8PyACW8KT26EBFF6eruDZ/S3 19vg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v142si117162pgb.459.2019.06.13.08.50.02; Thu, 13 Jun 2019 08:50:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733091AbfFMPty (ORCPT + 24 others); Thu, 13 Jun 2019 11:49:54 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:18153 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731702AbfFMJOV (ORCPT ); Thu, 13 Jun 2019 05:14:21 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 396BA4459CE4CA10A628; Thu, 13 Jun 2019 17:14:14 +0800 (CST) Received: from localhost.localdomain (10.67.212.132) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.439.0; Thu, 13 Jun 2019 17:14:07 +0800 From: Huazhong Tan To: CC: , , , , , Shiju Jose , Weihang Li , Peng Li , Huazhong Tan Subject: [PATCH net-next 03/12] net: hns3: process H/W errors occurred before HNS dev initialization Date: Thu, 13 Jun 2019 17:12:23 +0800 Message-ID: <1560417152-53050-4-git-send-email-tanhuazhong@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560417152-53050-1-git-send-email-tanhuazhong@huawei.com> References: <1560417152-53050-1-git-send-email-tanhuazhong@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.212.132] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shiju Jose Presently the HNS driver enables the HNS H/W error interrupts after the dev initialization is completed. However some exceptions such as NCSI errors can occur when the network port driver is not loaded and those errors required reporting to the BMC. Therefore the firmware enabled all the HNS ras error interrupts before the driver is loaded. And in some cases, there will be some H/W errors remained unclear before reboot. Thus the HNS driver needs to process and recover those hw errors occurred before HNS driver is initialized. This patch adds processing of the HNS hw errors(RAS and MSI-X) which occurred before the driver initialization. For RAS, because they are enabled by firmware, so we can detect specific bits, then log and clear them. But for MSI-X which can not be enabled before open vector0 irq, we can't detect the specific error bits, so we just write 1 to all interrupt source registers to clear. Signed-off-by: Shiju Jose Signed-off-by: Weihang Li Signed-off-by: Peng Li Signed-off-by: Huazhong Tan --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 111 +++++++++++++++++++-- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h | 1 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 + 3 files changed, 107 insertions(+), 8 deletions(-) -- 2.7.4 diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c index 3ea305e..ab9c5d5 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c @@ -1595,6 +1595,12 @@ pci_ers_result_t hclge_handle_hw_ras_error(struct hnae3_ae_dev *ae_dev) struct device *dev = &hdev->pdev->dev; u32 status; + if (!test_bit(HCLGE_STATE_SERVICE_INITED, &hdev->state)) { + dev_err(dev, + "Can't recover - RAS error reported during dev init\n"); + return PCI_ERS_RESULT_NONE; + } + status = hclge_read_dev(&hdev->hw, HCLGE_RAS_PF_OTHER_INT_STS_REG); if (status & HCLGE_RAS_REG_NFE_MASK || @@ -1631,6 +1637,21 @@ pci_ers_result_t hclge_handle_hw_ras_error(struct hnae3_ae_dev *ae_dev) return PCI_ERS_RESULT_RECOVERED; } +static int hclge_clear_hw_msix_error(struct hclge_dev *hdev, + struct hclge_desc *desc, bool is_mpf, + u32 bd_num) +{ + if (is_mpf) + desc[0].opcode = + cpu_to_le16(HCLGE_QUERY_CLEAR_ALL_MPF_MSIX_INT); + else + desc[0].opcode = cpu_to_le16(HCLGE_QUERY_CLEAR_ALL_PF_MSIX_INT); + + desc[0].flag = cpu_to_le16(HCLGE_CMD_FLAG_NO_INTR | HCLGE_CMD_FLAG_IN); + + return hclge_cmd_send(&hdev->hw, &desc[0], bd_num); +} + /* hclge_query_8bd_info: query information about over_8bd_nfe_err * @hdev: pointer to struct hclge_dev * @vf_id: Index of the virtual function with error @@ -1706,8 +1727,8 @@ static void hclge_handle_over_8bd_err(struct hclge_dev *hdev, } } -int hclge_handle_hw_msix_error(struct hclge_dev *hdev, - unsigned long *reset_requests) +static int hclge_handle_all_hw_msix_error(struct hclge_dev *hdev, + unsigned long *reset_requests) { struct hclge_mac_tnl_stats mac_tnl_stats; struct device *dev = &hdev->pdev->dev; @@ -1764,8 +1785,7 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, status, reset_requests); /* clear all main PF MSIx errors */ - hclge_cmd_reuse_desc(&desc[0], false); - ret = hclge_cmd_send(&hdev->hw, &desc[0], mpf_bd_num); + ret = hclge_clear_hw_msix_error(hdev, desc, true, mpf_bd_num); if (ret) { dev_err(dev, "clear all mpf msix int cmd failed (%d)\n", ret); @@ -1811,11 +1831,10 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, hclge_handle_over_8bd_err(hdev, reset_requests); /* clear all PF MSIx errors */ - hclge_cmd_reuse_desc(&desc[0], false); - ret = hclge_cmd_send(&hdev->hw, &desc[0], pf_bd_num); + ret = hclge_clear_hw_msix_error(hdev, desc, false, pf_bd_num); if (ret) { - dev_err(dev, "clear all pf msix int cmd failed (%d)\n", - ret); + dev_err(dev, "clear all pf msix int cmd failed (%d)\n", ret); + goto msi_error; } /* query and clear mac tnl interruptions */ @@ -1847,3 +1866,79 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, out: return ret; } + +int hclge_handle_hw_msix_error(struct hclge_dev *hdev, + unsigned long *reset_requests) +{ + struct device *dev = &hdev->pdev->dev; + + if (!test_bit(HCLGE_STATE_SERVICE_INITED, &hdev->state)) { + dev_err(dev, + "Can't handle - MSIx error reported during dev init\n"); + return 0; + } + + return hclge_handle_all_hw_msix_error(hdev, reset_requests); +} + +void hclge_handle_all_hns_hw_errors(struct hnae3_ae_dev *ae_dev) +{ +#define HCLGE_DESC_NO_DATA_LEN 8 + + struct hclge_dev *hdev = ae_dev->priv; + struct device *dev = &hdev->pdev->dev; + u32 mpf_bd_num, pf_bd_num, bd_num; + struct hclge_desc desc_bd; + struct hclge_desc *desc; + u32 status; + int ret; + + ae_dev->hw_err_reset_req = 0; + status = hclge_read_dev(&hdev->hw, HCLGE_RAS_PF_OTHER_INT_STS_REG); + + /* query the number of bds for the MSIx int status */ + hclge_cmd_setup_basic_desc(&desc_bd, HCLGE_QUERY_MSIX_INT_STS_BD_NUM, + true); + ret = hclge_cmd_send(&hdev->hw, &desc_bd, 1); + if (ret) { + dev_err(dev, "fail(%d) to query msix int status bd num\n", + ret); + return; + } + + mpf_bd_num = le32_to_cpu(desc_bd.data[0]); + pf_bd_num = le32_to_cpu(desc_bd.data[1]); + bd_num = max_t(u32, mpf_bd_num, pf_bd_num); + + desc = kcalloc(bd_num, sizeof(struct hclge_desc), GFP_KERNEL); + if (!desc) + return; + + /* Clear HNS hw errors reported through msix */ + memset(&desc[0].data[0], 0xFF, mpf_bd_num * sizeof(struct hclge_desc) - + HCLGE_DESC_NO_DATA_LEN); + ret = hclge_clear_hw_msix_error(hdev, desc, true, mpf_bd_num); + if (ret) { + dev_err(dev, "fail(%d) to clear mpf msix int during init\n", + ret); + goto msi_error; + } + + memset(&desc[0].data[0], 0xFF, pf_bd_num * sizeof(struct hclge_desc) - + HCLGE_DESC_NO_DATA_LEN); + ret = hclge_clear_hw_msix_error(hdev, desc, false, pf_bd_num); + if (ret) { + dev_err(dev, "fail(%d) to clear pf msix int during init\n", + ret); + goto msi_error; + } + + /* Handle Non-fatal HNS RAS errors */ + if (status & HCLGE_RAS_REG_NFE_MASK) { + dev_warn(dev, "HNS hw error(RAS) identified during init\n"); + hclge_handle_all_ras_errors(hdev); + } + +msi_error: + kfree(desc); +} diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h index be1186a..d821a76 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h @@ -123,6 +123,7 @@ struct hclge_hw_error { int hclge_config_mac_tnl_int(struct hclge_dev *hdev, bool en); int hclge_config_nic_hw_error(struct hclge_dev *hdev, bool state); int hclge_config_rocee_ras_interrupt(struct hclge_dev *hdev, bool en); +void hclge_handle_all_hns_hw_errors(struct hnae3_ae_dev *ae_dev); pci_ers_result_t hclge_handle_hw_ras_error(struct hnae3_ae_dev *ae_dev); int hclge_handle_hw_msix_error(struct hclge_dev *hdev, unsigned long *reset_requests); diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index f3e9030..d9863c30 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -8611,6 +8611,9 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev) hclge_clear_all_event_cause(hdev); + /* Log and clear the hw errors those already occurred */ + hclge_handle_all_hns_hw_errors(ae_dev); + /* Enable MISC vector(vector0) */ hclge_enable_vector(&hdev->misc_vector, true);