From patchwork Sun Sep 5 18:10:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Chan X-Patchwork-Id: 507309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, MIME_HEADER_CTYPE_ONLY, SPF_HELO_NONE, SPF_PASS, T_TVD_MIME_NO_HEADERS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B11B4C433EF for ; Sun, 5 Sep 2021 18:11:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 95BD260F5E for ; Sun, 5 Sep 2021 18:11:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238134AbhIESMn (ORCPT ); Sun, 5 Sep 2021 14:12:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238124AbhIESMm (ORCPT ); Sun, 5 Sep 2021 14:12:42 -0400 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4854EC061575 for ; Sun, 5 Sep 2021 11:11:39 -0700 (PDT) Received: by mail-pg1-x532.google.com with SMTP id f129so4420119pgc.1 for ; Sun, 05 Sep 2021 11:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=OPUF51LUrNgqFz089ayyIgiIPyx3bxOlE6TuCkJQbtc=; b=QZ8nW7XRowuxxeoiKinfslzSSbhxKf3/ooCykpizH+IxAwHCavShXEMFC5rRgqdVPM MpYiqZ9SXLnKvnWWbuxuDlbKnhsQZESVk7xuk9WRg4stByY5vaCK8HPGzzT58ZVAuDod hERP3gCd9yLu/1rmL4wvwg9gGnVq0Qw8S3r6o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=OPUF51LUrNgqFz089ayyIgiIPyx3bxOlE6TuCkJQbtc=; b=DcrMik29cltMTMCENAhNRRrNpUSl0LeTzpUXp86OMBG9BEDGhz1M6du5WhGIuXkbHB 569Q1n8S9jXEw4nfw13YXx4s67hnKpRIVV6eIWcBAmJhdu3A0zYmDNVyoZzr5iboUtfH sd2xKBpVJPAsdY4JfN/nXuqy7Q17GcHIPsrTblkqkrYNKqGv4OtXVznna79WfzyH9Pom lnkCuEqoS4vflJscr9MhxCUJrjpYB1aU0Z+CtC0gzScIcQhm3Yhi+erPatadpCwGum+s D/BZe02xfi7plr8aiwD+eEPhksxBSGjidq82zC6DgwzwvZz03NrleE1fY8wokihCiLZS tsOg== X-Gm-Message-State: AOAM531X9HXUGXs5csq1HZXO6VNOMbsbCz5mcd7HBca5pCe8bnR0WeDV VdzSrLj1pLh/XbTb3TEnhG1edFeB2ybBBA== X-Google-Smtp-Source: ABdhPJysbVtzYwFTD4bWeGzARZCmZXtDhv1MZO4Ljkf2Ar0vheyZwy5IGipuwpEU5suVynVB7lR0HA== X-Received: by 2002:a63:78c5:: with SMTP id t188mr8508720pgc.386.1630865498553; Sun, 05 Sep 2021 11:11:38 -0700 (PDT) Received: from localhost.swdvt.lab.broadcom.net ([192.19.223.252]) by smtp.gmail.com with ESMTPSA id d13sm5058115pfn.114.2021.09.05.11.11.35 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 05 Sep 2021 11:11:38 -0700 (PDT) From: Michael Chan To: davem@davemloft.net Cc: netdev@vger.kernel.org, kuba@kernel.org, edwin.peer@broadcom.com, gospo@broadcom.com Subject: [PATCH net 5/5] bnxt_en: Fix possible unintended driver initiated error recovery Date: Sun, 5 Sep 2021 14:10:59 -0400 Message-Id: <1630865459-19146-6-git-send-email-michael.chan@broadcom.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1630865459-19146-1-git-send-email-michael.chan@broadcom.com> References: <1630865459-19146-1-git-send-email-michael.chan@broadcom.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If error recovery is already enabled, bnxt_timer() will periodically check the heartbeat register and the reset counter. If we get an error recovery async. notification from the firmware (e.g. change in primary/secondary role), we will immediately read and update the heartbeat register and the reset counter. If the timer for the next health check expires soon after this, we may read the heartbeat register again in quick succession and find that it hasn't changed. This will trigger error recovery unintentionally. The likelihood is small because we also reset fw_health->tmr_counter which will reset the interval for the next health check. But the update is not protected and bnxt_timer() can miss the update and perform the health check without waiting for the full interval. Fix it by only reading the heartbeat register and reset counter in bnxt_async_event_process() if error recovery is trasitioning to the enabled state. Also add proper memory barriers so that when enabling for the first time, bnxt_timer() will see the tmr_counter interval and perform the health check after the full interval has elapsed. Fixes: 7e914027f757 ("bnxt_en: Enable health monitoring.") Reviewed-by: Edwin Peer Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 25 ++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 40a390652d8d..9b86516e59a1 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -2202,25 +2202,34 @@ static int bnxt_async_event_process(struct bnxt *bp, if (!fw_health) goto async_event_process_exit; - fw_health->enabled = EVENT_DATA1_RECOVERY_ENABLED(data1); - fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1); - if (!fw_health->enabled) { + if (!EVENT_DATA1_RECOVERY_ENABLED(data1)) { + fw_health->enabled = false; netif_info(bp, drv, bp->dev, "Error recovery info: error recovery[0]\n"); break; } + fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1); fw_health->tmr_multiplier = DIV_ROUND_UP(fw_health->polling_dsecs * HZ, bp->current_interval * 10); fw_health->tmr_counter = fw_health->tmr_multiplier; - fw_health->last_fw_heartbeat = - bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG); - fw_health->last_fw_reset_cnt = - bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG); + if (!fw_health->enabled) { + fw_health->last_fw_heartbeat = + bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG); + fw_health->last_fw_reset_cnt = + bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG); + } netif_info(bp, drv, bp->dev, "Error recovery info: error recovery[1], master[%d], reset count[%u], health status: 0x%x\n", fw_health->master, fw_health->last_fw_reset_cnt, bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG)); + if (!fw_health->enabled) { + /* Make sure tmr_counter is set and visible to + * bnxt_health_check() before setting enabled to true. + */ + smp_wmb(); + fw_health->enabled = true; + } goto async_event_process_exit; } case ASYNC_EVENT_CMPL_EVENT_ID_DEBUG_NOTIFICATION: @@ -11258,6 +11267,8 @@ static void bnxt_fw_health_check(struct bnxt *bp) if (!fw_health->enabled || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) return; + /* Make sure it is enabled before checking the tmr_counter. */ + smp_rmb(); if (fw_health->tmr_counter) { fw_health->tmr_counter--; return;