From patchwork Tue Sep 29 10:58:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 291133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D554C47423 for ; Tue, 29 Sep 2020 11:37:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4F8ED20848 for ; Tue, 29 Sep 2020 11:37:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601379458; bh=WikPyUCFPZ7pHLoTRdJQsuYKOGcZBMXxHdwv1rciJOY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=RBAulE34b6p5UyOpQBMnurONr+3zel8Vz/lwet2N9fe9aOv/5DkHy8Hw4e3kZqoAd 2V+wt38QpdEQtmipwB230W3rVXJh0a47LWkXmmD6EgcnOdIUXq6ZMKr5ZVMBjY7wsJ eM+zicsocNmooYLML0yGzDTPtSaAyXw5Yns/RSCM= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730263AbgI2Lhh (ORCPT ); Tue, 29 Sep 2020 07:37:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:50310 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730193AbgI2Lhe (ORCPT ); Tue, 29 Sep 2020 07:37:34 -0400 Received: from localhost (83-86-74-64.cable.dynamic.v4.ziggo.nl [83.86.74.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3412723A50; Tue, 29 Sep 2020 11:22:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601378560; bh=WikPyUCFPZ7pHLoTRdJQsuYKOGcZBMXxHdwv1rciJOY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kb4tLId/kOruw5cNrhJB4OvcnxDBEPQgAuOKT0eDfp0TWlPAHCi8mrCcHuFJZDwJ2 ORYiiybE8MXfTsL4A8nbakg6LLqV6y4bcSnSKKYEE3H3mV9kTsLfeBUEaUFAhxPLmf vNdPvcgjK67iPK8nZM1u9KAYC07STSLsz8NPYAzY= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Oliver OHalloran , Sam Bobroff , Michael Ellerman , Sasha Levin Subject: [PATCH 4.19 058/245] powerpc/eeh: Only dump stack once if an MMIO loop is detected Date: Tue, 29 Sep 2020 12:58:29 +0200 Message-Id: <20200929105949.822351230@linuxfoundation.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200929105946.978650816@linuxfoundation.org> References: <20200929105946.978650816@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Oliver O'Halloran [ Upstream commit 4e0942c0302b5ad76b228b1a7b8c09f658a1d58a ] Many drivers don't check for errors when they get a 0xFFs response from an MMIO load. As a result after an EEH event occurs a driver can get stuck in a polling loop unless it some kind of internal timeout logic. Currently EEH tries to detect and report stuck drivers by dumping a stack trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump would occur every few seconds if the driver was spinning in a loop. This results in a lot of spurious stack traces in the kernel log. Fix this by limiting it to printing one stack trace for each PE freeze. If the driver is truely stuck the kernel's hung task detector is better suited to reporting the probelm anyway. Signed-off-by: Oliver O'Halloran Reviewed-by: Sam Bobroff Tested-by: Sam Bobroff Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com Signed-off-by: Sasha Levin --- arch/powerpc/kernel/eeh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index fe3c6f3bd3b62..d123cba0992d0 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -502,7 +502,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev) rc = 1; if (pe->state & EEH_PE_ISOLATED) { pe->check_count++; - if (pe->check_count % EEH_MAX_FAILS == 0) { + if (pe->check_count == EEH_MAX_FAILS) { dn = pci_device_to_OF_node(dev); if (dn) location = of_get_property(dn, "ibm,loc-code",