From patchwork Mon Jun 24 15:09:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Richter X-Patchwork-Id: 167611 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp4347871ilk; Mon, 24 Jun 2019 08:09:44 -0700 (PDT) X-Google-Smtp-Source: APXvYqwSVv9XFTrE1A5Mf8ScVByeBga5LKV6V9SBAEBqPeuGhsOjglAOnx/fPJslLTMFZ3TSLLp6 X-Received: by 2002:a17:90a:2190:: with SMTP id q16mr24694722pjc.23.1561388984745; Mon, 24 Jun 2019 08:09:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561388984; cv=none; d=google.com; s=arc-20160816; b=MX+82MN08kcJ4M8rNf3/fJpCur6XEHzrmcm+5IkHINl55N+NxSZWsiv6pq9Q6mFc8a r1L1MWfbibEPg6ibwuOB7WN/16SR5nFT3+6mox1oGpaiYP1pZXESAJsfx9KAk1VEP4k7 Rhx0cASlKT8sqTThjCXCoN2922GZU6eNd5a3GkBHRhqZFrdYNK5VtnzcsFYI4Hj+xJV0 +h6L4ZPkQqUKC19Lpm5RrOtJAbrSytgFMIYToJ9AFuMFRRiOPNmvuX0MVk/fWES+52UT sMXqBuTW+VXTfJ0tDVZc2EpZmbpKcViMGF72ptSZ5d9fOUmZVgchBb4ns+TUAZxMdR3C Uc8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:dkim-signature :dkim-signature; bh=dhPjSm/o1hnsesbUEQ+VSvp8f+PEBgp+eZbBJxLIHnc=; b=s7ydDI0AiwM9DG9edOE3mvZj2firalQ4quxQR4W3951fzC33oX4grn2EYlLRUrWtjx /fJ9KX53KB69k+6R0/TbrHKLXDiuxbiUo0dnWxtlk+vi+50SpoLzaSNQ6fWKG1JP4Vif TOfxBQxcYcjUaajlWXqprpbj/02/CRa0eiFBnSnnGb8jO3Uxp8zacZg4zTNqGHNHFgzR 3aL+VA3LBPRAwkn7ZvZPfFB+666mHCqbGGcl3fsbO7VD5oFssSV+9Tixxs+X2SouZa0t lMt+h5mfv92/dfXjQFTYVLB56Qvx5WJgSIr1AHh4A0OnNU+CsxRg18nWKFTpcb+DFRsu xq2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=MySNMAvV; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b=Ai3eH4pb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q28si9836780pgb.375.2019.06.24.08.09.44; Mon, 24 Jun 2019 08:09:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=MySNMAvV; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b=Ai3eH4pb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730775AbfFXPJn (ORCPT + 30 others); Mon, 24 Jun 2019 11:09:43 -0400 Received: from mx0a-0016f401.pphosted.com ([67.231.148.174]:36938 "EHLO mx0b-0016f401.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730713AbfFXPJi (ORCPT ); Mon, 24 Jun 2019 11:09:38 -0400 Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x5OF0BQp018658; Mon, 24 Jun 2019 08:09:32 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=dhPjSm/o1hnsesbUEQ+VSvp8f+PEBgp+eZbBJxLIHnc=; b=MySNMAvV2rPLP9aCifpyfML4ptVrUAnmr3eD/kGIAyQ4B+6eAy5rUEp8UIK8rKZ3BPQ8 w9SKwn1koPnkTntmxH/xWEUkf/LPwbCoy3CRIlKK4tQAOFDin+nfj/QUGLEn5uFcluri vcTNgFr0tzZYcQhIpXzroRrZfGyiJU5KE69+DPVbOLWACzDAwfScNYlquKWpl4kwwqPU cVp/0v7fcXRJMhc7JuGxooeXyoov7+fV3lI3zpZwQfrdoh4vrYqYBoFj5VmsqCyvsk4a ReBS2Y3OzJP5Kfjv+qrJjIxo7M+sQfC1O8kADBU2toUwx49CrXMRdZguiS4mRRdSvL+I IA== Received: from sc-exch04.marvell.com ([199.233.58.184]) by mx0a-0016f401.pphosted.com with ESMTP id 2tarxr9tbp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Mon, 24 Jun 2019 08:09:32 -0700 Received: from SC-EXCH04.marvell.com (10.93.176.84) by SC-EXCH04.marvell.com (10.93.176.84) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Mon, 24 Jun 2019 08:09:31 -0700 Received: from NAM03-BY2-obe.outbound.protection.outlook.com (104.47.42.55) by SC-EXCH04.marvell.com (10.93.176.84) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Mon, 24 Jun 2019 08:09:30 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dhPjSm/o1hnsesbUEQ+VSvp8f+PEBgp+eZbBJxLIHnc=; b=Ai3eH4pbpnh3eRP3eotDQOO8w5FZFc/LuENoC0N74vunQB5pAjCcDrmTgvZ5Nn7gl2ysZ+CbTTCaLBqhltNMoy1tyilGekeYI5kNj26kKoB4vSpPYnJGDdiJibxaUkwIh9nQn9+muCcdxYr0+kMvSNKH8cKDL+1ip50cXEuDjiQ= Received: from MN2PR18MB3408.namprd18.prod.outlook.com (10.255.238.217) by MN2PR18MB3197.namprd18.prod.outlook.com (10.255.236.158) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2008.13; Mon, 24 Jun 2019 15:09:29 +0000 Received: from MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::d3:794c:1b94:cf3]) by MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::d3:794c:1b94:cf3%4]) with mapi id 15.20.2008.014; Mon, 24 Jun 2019 15:09:30 +0000 From: Robert Richter To: Borislav Petkov , James Morse , "Mauro Carvalho Chehab" CC: "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Robert Richter Subject: [PATCH v2 17/24] EDAC, ghes: Create one memory controller device per node Thread-Topic: [PATCH v2 17/24] EDAC, ghes: Create one memory controller device per node Thread-Index: AQHVKp7S87ADY1caY0yYL8oUsqwLHA== Date: Mon, 24 Jun 2019 15:09:29 +0000 Message-ID: <20190624150758.6695-18-rrichter@marvell.com> References: <20190624150758.6695-1-rrichter@marvell.com> In-Reply-To: <20190624150758.6695-1-rrichter@marvell.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: HE1P190CA0035.EURP190.PROD.OUTLOOK.COM (2603:10a6:7:52::24) To MN2PR18MB3408.namprd18.prod.outlook.com (2603:10b6:208:16c::25) x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 2.20.1 x-originating-ip: [92.254.182.202] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: b1abfc60-0d7d-42dd-10ab-08d6f8b5f486 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:MN2PR18MB3197; x-ms-traffictypediagnostic: MN2PR18MB3197: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:3826; x-forefront-prvs: 007814487B x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39860400002)(366004)(136003)(396003)(346002)(376002)(199004)(189003)(36756003)(110136005)(107886003)(54906003)(81156014)(81166006)(68736007)(8676002)(25786009)(5660300002)(8936002)(305945005)(7736002)(4326008)(1076003)(186003)(6116002)(14454004)(476003)(486006)(2616005)(11346002)(256004)(14444005)(446003)(3846002)(50226002)(26005)(2906002)(52116002)(76176011)(71200400001)(71190400001)(99286004)(386003)(6506007)(102836004)(478600001)(316002)(86362001)(6512007)(66066001)(53936002)(6486002)(73956011)(66946007)(66476007)(66556008)(64756008)(66446008)(6436002); DIR:OUT; SFP:1101; SCL:1; SRVR:MN2PR18MB3197; H:MN2PR18MB3408.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: ubogojV9lTk7pnX8xKmkkBwBiqsyOScxOHOTcrPSt9pAVJxibu14JyqUzKa2ELAhSC0Gk1lYMotNsACkhvTDhMQeZ/x8I2Ylkj0fDhxeiU7dQxPSTSmq9I4R+SGxc4268jgKzpl2RyAdTdQGH9LLN6zvHZN1E57oLZg7SxxBsW4AIQhC+hpGh5LGM9WxmkzPWZ2Ni83RCAcS2JJIsqnqi1OT629EvBUsDIB8q1CUJTXB5h61H2EQYn1UjZmsyyMZcaS6DzAaG8R+QVlQzR4M2UhH6MYnPzs8Z10BNTgxqRxrcvj/hJbtho0zu1+vU1s0LTmjKozDOrtmQlDeGOUDbVPFMB5lI5p5o2/y9U7E7mlzX38zbk1IdtMxxDOZQ+ElZM0wtAspccR+Vxj4seuihQM1wZS2UeByWtec8XxV4yM= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: b1abfc60-0d7d-42dd-10ab-08d6f8b5f486 X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Jun 2019 15:09:29.8857 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rrichter@marvell.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR18MB3197 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-06-24_10:, , signatures=0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Typically for most systems, there is one edac memory controller device per node. This patch implements the same for the ghes driver. Now, create multiple mc devices and map the dimms based on the node id. We need at least one node that is used as fallback if no node information is available in the error report. Here a complete and consistent error report from a ThunderX2 system (zero counter values dropped): # find /sys/devices/system/edac/mc/ -name \*count | sort -V | xargs grep . | sed -e '/:0/d' /sys/devices/system/edac/mc/mc0/ce_count:11 /sys/devices/system/edac/mc/mc0/ce_noinfo_count:1 /sys/devices/system/edac/mc/mc0/csrow2/ce_count:5 /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:5 /sys/devices/system/edac/mc/mc0/csrow3/ce_count:3 /sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count:3 /sys/devices/system/edac/mc/mc0/csrow4/ce_count:2 /sys/devices/system/edac/mc/mc0/csrow4/ch0_ce_count:2 /sys/devices/system/edac/mc/mc0/dimm2/dimm_ce_count:5 /sys/devices/system/edac/mc/mc0/dimm3/dimm_ce_count:3 /sys/devices/system/edac/mc/mc0/dimm4/dimm_ce_count:2 /sys/devices/system/edac/mc/mc1/ce_count:7 /sys/devices/system/edac/mc/mc1/csrow2/ce_count:4 /sys/devices/system/edac/mc/mc1/csrow2/ch0_ce_count:4 /sys/devices/system/edac/mc/mc1/csrow3/ce_count:1 /sys/devices/system/edac/mc/mc1/csrow3/ch0_ce_count:1 /sys/devices/system/edac/mc/mc1/csrow6/ce_count:2 /sys/devices/system/edac/mc/mc1/csrow6/ch0_ce_count:2 /sys/devices/system/edac/mc/mc1/dimm2/dimm_ce_count:4 /sys/devices/system/edac/mc/mc1/dimm3/dimm_ce_count:1 /sys/devices/system/edac/mc/mc1/dimm6/dimm_ce_count:2 Signed-off-by: Robert Richter --- drivers/edac/ghes_edac.c | 126 ++++++++++++++++++++++++++++++++------- 1 file changed, 104 insertions(+), 22 deletions(-) -- 2.20.1 diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 13b74368ad81..63de11654649 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -16,6 +16,7 @@ #include struct ghes_edac_pvt { + struct device dev; struct list_head list; struct ghes *ghes; struct mem_ctl_info *mci; @@ -26,7 +27,7 @@ struct ghes_edac_pvt { }; static atomic_t ghes_init = ATOMIC_INIT(0); -static struct ghes_edac_pvt *ghes_pvt; +struct mem_ctl_info *fallback; /* * Sync with other, potentially concurrent callers of @@ -172,15 +173,15 @@ static void ghes_edac_set_nid(const struct dmi_header *dh, void *arg) } } -static int get_dimm_smbios_index(u16 handle) +static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle) { - struct mem_ctl_info *mci = ghes_pvt->mci; struct dimm_info *dimm; mci_for_each_dimm(mci, dimm) { if (dimm->smbios_handle == handle) return dimm->idx; } + return -1; } @@ -364,6 +365,9 @@ static void mem_info_prepare_mci(struct mem_ctl_info *mci) int index = 0; for_each_dimm(dimm) { + if (mci->mc_idx != dimm->numa_node) + continue; + dmi_dimm = &dimm->dimm_info; mci_dimm = edac_get_dimm_by_index(mci, index); @@ -384,17 +388,35 @@ static void mem_info_prepare_mci(struct mem_ctl_info *mci) index, mci->tot_dimms); } +static struct mem_ctl_info *get_mc_by_node(int nid) +{ + struct mem_ctl_info *mci = edac_mc_find(nid); + + if (mci) + return mci; + + if (num_possible_nodes() > 1) { + edac_mc_printk(fallback, KERN_WARNING, + "Invalid or no node information, falling back to first node: %s", + fallback->dev_name); + } + + return fallback; +} + void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) { struct dimm_info *dimm_info; enum hw_event_mc_err_type type; struct edac_raw_error_desc *e; struct mem_ctl_info *mci; - struct ghes_edac_pvt *pvt = ghes_pvt; + struct ghes_edac_pvt *pvt; unsigned long flags; char *p; + int nid = NUMA_NO_NODE; - if (!pvt) + /* We need at least one mc */ + if (WARN_ON_ONCE(!fallback)) return; /* @@ -407,7 +429,11 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) spin_lock_irqsave(&ghes_lock, flags); - mci = pvt->mci; + /* select the node's mc device */ + if (mem_err->validation_bits & CPER_MEM_VALID_NODE) + nid = mem_err->node; + mci = get_mc_by_node(nid); + pvt = mci->pvt_info; e = &mci->error_desc; /* Cleans the error report buffer */ @@ -541,7 +567,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) p += sprintf(p, "DIMM DMI handle: 0x%.4x ", mem_err->mem_dev_handle); - index = get_dimm_smbios_index(mem_err->mem_dev_handle); + index = get_dimm_smbios_index(mci, mem_err->mem_dev_handle); if (index >= 0) e->top_layer = index; } @@ -640,15 +666,29 @@ static struct acpi_platform_list plat_list[] = { { } /* End */ }; +void ghes_edac_release(struct device *dev) +{ + struct ghes_edac_pvt *ghes_pvt; + struct mem_ctl_info *mci; + + ghes_pvt = container_of(dev, struct ghes_edac_pvt, dev); + + mci = ghes_pvt->mci; + edac_mc_del_mc(mci->pdev); + edac_mc_free(mci); +} + static int ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent) { + struct device *dev; + struct ghes_edac_pvt *ghes_pvt; int rc; struct mem_ctl_info *mci; struct edac_mc_layer layers[1]; layers[0].type = EDAC_MC_LAYER_ALL_MEM; - layers[0].size = mem_info.num_dimm; + layers[0].size = mem_info.dimms_per_node[nid]; layers[0].is_virt_csrow = true; mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, @@ -662,43 +702,69 @@ ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent) ghes_pvt->ghes = ghes; ghes_pvt->mci = mci; - mci->pdev = parent; + dev = &ghes_pvt->dev; + dev->parent = parent; + dev->release = ghes_edac_release; + dev_set_name(dev, "ghes_mc%d", nid); + + rc = device_register(dev); + if (rc) { + pr_err("Can't create EDAC device (%d)\n", rc); + goto fail; + } + + mci->pdev = dev; mci->mtype_cap = MEM_FLAG_EMPTY; mci->edac_ctl_cap = EDAC_FLAG_NONE; mci->edac_cap = EDAC_FLAG_NONE; mci->mod_name = "ghes_edac.c"; - mci->ctl_name = "ghes_edac"; - mci->dev_name = "ghes"; + mci->ctl_name = "ghes_mc"; + mci->dev_name = dev_name(dev); mem_info_prepare_mci(mci); rc = edac_mc_add_mc(mci); if (rc < 0) { - pr_err("Can't register at EDAC core\n"); - edac_mc_free(mci); - return -ENODEV; + pr_err("Can't register at EDAC core (%d)\n", rc); + goto fail; } + return 0; +fail: + put_device(dev); + return rc; +} + +static void ghes_edac_unregister_one(struct mem_ctl_info *mci) +{ + struct ghes_edac_pvt *pvt = mci->pvt_info; + + put_device(&pvt->dev); } void ghes_edac_unregister(struct ghes *ghes) { struct mem_ctl_info *mci; + int nid; - if (!ghes_pvt) - return; - - mci = ghes_pvt->mci; - edac_mc_del_mc(mci->pdev); - edac_mc_free(mci); + for_each_node(nid) { + mci = edac_mc_find(nid); + /* stop fallback at last */ + if (mci && mci != fallback) + ghes_edac_unregister_one(mci); + } + ghes_edac_unregister_one(fallback); + fallback = NULL; kfree(mem_info.dimms); + atomic_dec(&ghes_init); } int ghes_edac_register(struct ghes *ghes, struct device *dev) { bool fake = false; int rc; + int nid; int idx = -1; if (IS_ENABLED(CONFIG_X86)) { @@ -738,7 +804,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) pr_info("This system has %d DIMM sockets.\n", mem_info.num_dimm); } - rc = ghes_edac_register_one(0, ghes, dev); + for_each_node(nid) { + if (!mem_info.dimms_per_node[nid]) + continue; - return rc; + rc = ghes_edac_register_one(nid, ghes, dev); + if (rc) { + ghes_edac_unregister(ghes); + return rc; + } + + /* + * use the first node's mc as fallback in case we can + * not detect the node from the error information + */ + if (!fallback) + fallback = edac_mc_find(nid); + } + + return 0; }