From patchwork Mon Jun 24 15:09:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Richter X-Patchwork-Id: 167613 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp4347966ilk; Mon, 24 Jun 2019 08:09:49 -0700 (PDT) X-Google-Smtp-Source: APXvYqzbkcPLs7Wu10veCtEemjP7UIyF+0UHrX/udSywTc8oDmT6rUippKfjwKonp1M8J97zRNQK X-Received: by 2002:a17:902:bf08:: with SMTP id bi8mr114474204plb.189.1561388989487; Mon, 24 Jun 2019 08:09:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561388989; cv=none; d=google.com; s=arc-20160816; b=hUUZLY2xXePBbGeiJb6n8GeuRpWbqFoCetEQwXg5lFh6r0l1tve2oGMztjXljC9GQZ tP/juAqkdq0R6M2cDyF3m5x5QTo7uozhbpT9LKB82xgTgwa4u795bH7p9GAo+1JtBlKD 7EWCCNkjubyf1T1z3G2prhPPz44niX6SKgRCR1glc+aFQieXMwuahy9EpU2SuMOvQ6te WF43/GXK6rgmqYKjvdHq8W9fH87XgJKsjvKVjgyBANCTcOLjvFP3TXwZmtPp3THBMXj9 0o4qHUm1mZAkcUm7oFR0oNzjNNEZVtp14K4isbRiGExN+2OYERS0bjwdNs3aK7qsR10W i+bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:dkim-signature :dkim-signature; bh=K9GfRHH0jcdtWOydVObOCSt9IlWuHJ/kns8jWsyBVi0=; b=Tkkvx/9y2Mf+iUIC4hkuavlVqhsC1QmpfEZ3wUW0UjYI/All9CULI4tUzSX4SHRMOC nxMPSYiELQ+467EAevaCU65XTePg6ny08l5razWuA52UBwRIjavTMtMch3HS4tFltPEX oZld01SMqZRMltxoraVZDx6gKWlhNanGSj3zI+kcY8KYo2zSWLHWIA919NHR9Xd+h/gT KiQcoAAbD8EQHWkF8PD0Pmqj708SdY/7PVTRQjIryktp1Qn8HPftM3w80iWLqlNKaYw6 oujtoVDXHyaZfv4v1oedqczzipXXmMQUncY7yvePr5ILNnXXvReotakwxGGzfZlXye8o /FJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=n8+9G88U; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b=QmHyjbUK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a14si10905432pga.567.2019.06.24.08.09.49; Mon, 24 Jun 2019 08:09:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=n8+9G88U; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b=QmHyjbUK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730844AbfFXPJr (ORCPT + 30 others); Mon, 24 Jun 2019 11:09:47 -0400 Received: from mx0a-0016f401.pphosted.com ([67.231.148.174]:14776 "EHLO mx0b-0016f401.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730794AbfFXPJp (ORCPT ); Mon, 24 Jun 2019 11:09:45 -0400 Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x5OF053n018453; Mon, 24 Jun 2019 08:09:38 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=K9GfRHH0jcdtWOydVObOCSt9IlWuHJ/kns8jWsyBVi0=; b=n8+9G88Uvk+0A5cqfuHH6nXaJPjKH+E/xyoT8TEbieinlLkG3vOu/AeiyidowAN2Vf9J GfADM1hbBxNRkOgmI8TcoLI63Y5RVjmOHcJSkhCpQf2nYWg+QeK4xygyr3MPwQNpe5eJ 9ZxtYIaVnMX9eytn9b1GzEj1KQOIifDi4B+Z14BKVvLYQKWnExRb/cgwrA/psjk38GMA gNzu5X75ICUZdjLa75K0H0/0d4h4a1CuTxdg+tq7HMglvIwrcf3PngMPPaXy+E6xQUNR rw5dtngS+nv44NjJvsvVgkmA0SQLfNp7b8Mb0rwpHry9v0exzVznuWisc22yH8eU6ScB Yw== Received: from sc-exch02.marvell.com ([199.233.58.182]) by mx0a-0016f401.pphosted.com with ESMTP id 2tarxr9tby-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Mon, 24 Jun 2019 08:09:38 -0700 Received: from SC-EXCH02.marvell.com (10.93.176.82) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Mon, 24 Jun 2019 08:09:37 -0700 Received: from NAM03-BY2-obe.outbound.protection.outlook.com (104.47.42.52) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Mon, 24 Jun 2019 08:09:36 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=K9GfRHH0jcdtWOydVObOCSt9IlWuHJ/kns8jWsyBVi0=; b=QmHyjbUK+xxXjyeEme1w67qbUZ7PRN0OE7AUsbRVVO/vmyBDhsaQxS7FAIZwjFNu5hM4HKpf9gTED/687HWeaSl7+EypOQ7lg+ZsqQ401dRaVS2IgO+OM7lsu4dIzvwDfgRRDt6w215tt2t8KiJIXVRkqqA+cs3aTqYYjZDFuQ4= Received: from MN2PR18MB3408.namprd18.prod.outlook.com (10.255.238.217) by MN2PR18MB2589.namprd18.prod.outlook.com (20.179.82.96) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2008.16; Mon, 24 Jun 2019 15:09:35 +0000 Received: from MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::d3:794c:1b94:cf3]) by MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::d3:794c:1b94:cf3%4]) with mapi id 15.20.2008.014; Mon, 24 Jun 2019 15:09:35 +0000 From: Robert Richter To: Borislav Petkov , James Morse , "Mauro Carvalho Chehab" CC: "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Robert Richter Subject: [PATCH v2 20/24] EDAC, ghes: Identify dimm by node, card, module and handle Thread-Topic: [PATCH v2 20/24] EDAC, ghes: Identify dimm by node, card, module and handle Thread-Index: AQHVKp7VnLiZoYHFAEGHkRpNDru4Aw== Date: Mon, 24 Jun 2019 15:09:35 +0000 Message-ID: <20190624150758.6695-21-rrichter@marvell.com> References: <20190624150758.6695-1-rrichter@marvell.com> In-Reply-To: <20190624150758.6695-1-rrichter@marvell.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: HE1P190CA0035.EURP190.PROD.OUTLOOK.COM (2603:10a6:7:52::24) To MN2PR18MB3408.namprd18.prod.outlook.com (2603:10b6:208:16c::25) x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 2.20.1 x-originating-ip: [92.254.182.202] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: b599a715-7872-48b1-ff4c-08d6f8b5f832 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:MN2PR18MB2589; x-ms-traffictypediagnostic: MN2PR18MB2589: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:9508; x-forefront-prvs: 007814487B x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(39860400002)(346002)(376002)(366004)(136003)(189003)(199004)(2906002)(2616005)(486006)(476003)(52116002)(14444005)(256004)(5660300002)(66446008)(73956011)(66946007)(1076003)(14454004)(6116002)(86362001)(3846002)(99286004)(6486002)(68736007)(81166006)(81156014)(6436002)(4326008)(305945005)(50226002)(53936002)(7736002)(8676002)(8936002)(478600001)(64756008)(66556008)(66476007)(6512007)(107886003)(71190400001)(71200400001)(36756003)(26005)(102836004)(386003)(186003)(11346002)(76176011)(6506007)(110136005)(316002)(54906003)(446003)(25786009)(66066001); DIR:OUT; SFP:1101; SCL:1; SRVR:MN2PR18MB2589; H:MN2PR18MB3408.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: bUAgWDCChLW5VycNAIiwIPaki+JVuuWFthCOQksPdhSgGdEeYtKs5N351RB9KoP7pPGmqRd5zOi4OOv+BuCS6iufQYuwQCWhbDvaqLkPbXufc7C/fq9NBQpuwKJAylvC0+w4vj17jajD+eIwnDJAUkK53n9s3SUV0tcBCOzAEpgeMlDB6InlblVggtqHXhzC66aqOIt1q/A059Qqs/smDJz8n/BNv85a0p0WaN0vkQ7R5gXAN9j5I+ROeYVFXkG4Qt7rDvztd5LBAh/NFO4qAl8M0xg1zn0imNC3l44oH+weFULJRzejFQ9/zc05K90oPaiTvPjwjnINb43RFx9xu6PpuaDqaN2oAkpcxkqaSkkrj5GFcXZ9X2BKLAUS7Zg1k0Fw4YkiYXXEsRGCoM/LRBRYaUAGnO9FtAEicF4gj1g= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: b599a715-7872-48b1-ff4c-08d6f8b5f832 X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Jun 2019 15:09:35.8763 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rrichter@marvell.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR18MB2589 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-06-24_10:, , signatures=0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org According to SMBIOS Spec. 2.7 (N.2.5 Memory Error Section), a failing DIMM (module or rank number) can be identified by its error location consisting of node, card and module. A module handle is used to map it to the dimms listed in the dmi table. Collect all those data from the error record and select the dimm accordingly. Inconsistent error records will be reported which is the case if the same dimm handle reports errors with different node, card or module. The change allows to enable per-layer reporting based on node, card and module in the next patch. Signed-off-by: Robert Richter --- drivers/edac/ghes_edac.c | 74 +++++++++++++++++++++++++++++++++------- 1 file changed, 62 insertions(+), 12 deletions(-) -- 2.20.1 diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 97f992006281..689841c5c84d 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -81,8 +81,11 @@ struct memarr_dmi_entry { struct ghes_dimm_info { struct dimm_info dimm_info; + struct dimm_info *dimm; int idx; int numa_node; + int card; + int module; phys_addr_t start; phys_addr_t end; u16 phys_handle; @@ -128,6 +131,8 @@ static int ghes_dimm_info_init(int num) for_each_dimm(dimm) { dimm->idx = idx; dimm->numa_node = NUMA_NO_NODE; + dimm->card = -1; + dimm->module = -1; idx++; } @@ -395,6 +400,13 @@ static void mem_info_prepare_mci(struct mem_ctl_info *mci) if (*dmi_dimm->label) strcpy(mci_dimm->label, dmi_dimm->label); + + /* + * From here on do not use any longer &dimm.dimm_info. + * Instead switch to the mci's dimm info which might + * contain updated data, such as the label. + */ + dimm->dimm = mci_dimm; } if (index != mci->tot_dimms) @@ -402,24 +414,46 @@ static void mem_info_prepare_mci(struct mem_ctl_info *mci) index, mci->tot_dimms); } -static struct mem_ctl_info *get_mc_by_node(int nid) +/* Requires ghes_lock being set. */ +static struct ghes_dimm_info * +get_and_prepare_dimm_info(int nid, int card, int module, int handle) { - struct mem_ctl_info *mci = edac_mc_find(nid); + static struct ghes_dimm_info *dimm; + struct dimm_info *di; - if (mci) - return mci; + /* + * We require smbios_handle being set in the error report for + * per layer reporting (SMBIOS handle for the Type 17 Memory + * Device Structure that represents the Memory Module) + */ + for_each_dimm(dimm) { + di = dimm->dimm; + if (di->smbios_handle == handle) + goto found; + } - if (num_possible_nodes() > 1) { - edac_mc_printk(fallback, KERN_WARNING, - "Invalid or no node information, falling back to first node: %s", - fallback->dev_name); + return NULL; +found: + if (dimm->card < 0 && card >= 0) + dimm->card = card; + if (dimm->module < 0 && module >= 0) + dimm->module = module; + + if ((num_possible_nodes() > 1 && di->mci->mc_idx != nid) || + (card >= 0 && card != dimm->card) || + (module >= 0 && module != dimm->module)) { + edac_mc_printk(di->mci, KERN_WARNING, + "Inconsistent error report (nid/card/module): %d/%d/%d (dimm%d: %d/%d/%d)", + nid, card, module, di->idx, + di->mci->mc_idx, dimm->card, dimm->module); } - return fallback; + return dimm; } void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) { + struct ghes_dimm_info *dimm; struct dimm_info *dimm_info; enum hw_event_mc_err_type type; struct edac_raw_error_desc *e; @@ -428,6 +462,9 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) unsigned long flags; char *p; int nid = NUMA_NO_NODE; + int card = -1; + int module = -1; + int handle = -1; /* We need at least one mc */ if (WARN_ON_ONCE(!fallback)) @@ -443,10 +480,23 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) spin_lock_irqsave(&ghes_lock, flags); - /* select the node's mc device */ if (mem_err->validation_bits & CPER_MEM_VALID_NODE) nid = mem_err->node; - mci = get_mc_by_node(nid); + if (mem_err->validation_bits & CPER_MEM_VALID_CARD) + card = mem_err->card; + if (mem_err->validation_bits & CPER_MEM_VALID_MODULE) + module = mem_err->module; + if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) + handle = mem_err->mem_dev_handle; + + dimm = get_and_prepare_dimm_info(nid, card, module, handle); + if (dimm) + mci = dimm->dimm->mci; + else + mci = edac_mc_find(nid); + if (!mci) + mci = fallback; + pvt = mci->pvt_info; e = &mci->error_desc; @@ -665,7 +715,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) if (p > pvt->other_detail) *(p - 1) = '\0'; - dimm_info = edac_get_dimm_by_index(mci, e->top_layer); + dimm_info = dimm ? dimm->dimm : NULL; edac_raw_mc_handle_error(type, mci, dimm_info, e, -1, -1);