Message ID | 20190305132147.3739133-1-arnd@arndb.de |
---|---|
State | New |
Headers | show |
Series | EDAC: i10nm, skx: fix randconfig builds | expand |
On Tue, Mar 05, 2019 at 02:21:30PM +0100, Arnd Bergmann wrote: > In a configuration where one of the two drivers is built-in and the > other one is a module, kbuild tries to add the modular file into > vmlinux, and fails with There must be something more to that .config of yours because both CONFIG_EDAC_SKX=m CONFIG_EDAC_I10NM=y and CONFIG_EDAC_SKX=y CONFIG_EDAC_I10NM=m work ok here... -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
On Tue, Mar 5, 2019 at 3:34 PM Borislav Petkov <bp@alien8.de> wrote: > > On Tue, Mar 05, 2019 at 02:21:30PM +0100, Arnd Bergmann wrote: > > In a configuration where one of the two drivers is built-in and the > > other one is a module, kbuild tries to add the modular file into > > vmlinux, and fails with > > There must be something more to that .config of yours because both > > CONFIG_EDAC_SKX=m > CONFIG_EDAC_I10NM=y > > and > > CONFIG_EDAC_SKX=y > CONFIG_EDAC_I10NM=m > > work ok here... Quite possible, here is the fill .config file that I used as attachment Arnd
On Wed, Mar 6, 2019 at 6:58 PM Luck, Tony <tony.luck@intel.com> wrote: > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > > This seems cleaner than adding all the EXPORTs to skx_common.c > I also tried a build with the 0x8A152468-config.gz that Arnd > supplied. It's still a bit fragile since you do something that kbuild doesn't expect with having a file in both a module and built-in code in some configurations. I'm fairly sure this version works today, but it would break the next time that file gets changed to include a reference to THIS_MODULE, or anything else that is different between built-in and modular code. Another alternative would be to mark all symbols in this file 'static' and then include skx_common.c from the other two files. Of course that is also ugly and it increases the overall code size, so I don't see a way to win this. Arnd
On Wed, Mar 06, 2019 at 09:15:13PM +0100, Arnd Bergmann wrote: > On Wed, Mar 6, 2019 at 6:58 PM Luck, Tony <tony.luck@intel.com> wrote: > > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > > > > This seems cleaner than adding all the EXPORTs to skx_common.c > > I also tried a build with the 0x8A152468-config.gz that Arnd > > supplied. > > It's still a bit fragile since you do something that kbuild doesn't > expect with having a file in both a module and built-in code > in some configurations. I'm fairly sure this version works today, > but it would break the next time that file gets changed to include > a reference to THIS_MODULE, or anything else that is different > between built-in and modular code. > > Another alternative would be to mark all symbols in this file > 'static' and then include skx_common.c from the other two files. > Of course that is also ugly and it increases the overall code size, > so I don't see a way to win this. Boris, So where should we go. Proposed solutions are piling up: 1) Make skx_common a module [downside: have to EXPORT everything in it] 2) Move module-ish bits out of skx_common [downside: perhaps fragile] 3) #include skx_common.c into {skx,i10nm}_edac.c [downside: no patch yet, bigger code size] -Tony
On Thu, Mar 14, 2019 at 12:01 AM Luck, Tony <tony.luck@intel.com> wrote: > > On Wed, Mar 06, 2019 at 09:15:13PM +0100, Arnd Bergmann wrote: > > On Wed, Mar 6, 2019 at 6:58 PM Luck, Tony <tony.luck@intel.com> wrote: > > > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > > > > > > This seems cleaner than adding all the EXPORTs to skx_common.c > > > I also tried a build with the 0x8A152468-config.gz that Arnd > > > supplied. > > > > It's still a bit fragile since you do something that kbuild doesn't > > expect with having a file in both a module and built-in code > > in some configurations. I'm fairly sure this version works today, > > but it would break the next time that file gets changed to include > > a reference to THIS_MODULE, or anything else that is different > > between built-in and modular code. > > > > Another alternative would be to mark all symbols in this file > > 'static' and then include skx_common.c from the other two files. > > Of course that is also ugly and it increases the overall code size, > > so I don't see a way to win this. > > Boris, > > So where should we go. Proposed solutions are piling up: > > 1) Make skx_common a module > [downside: have to EXPORT everything in it] > 2) Move module-ish bits out of skx_common > [downside: perhaps fragile] > 3) #include skx_common.c into {skx,i10nm}_edac.c > [downside: no patch yet, bigger code size] Sorry to add on to the already long list, but one more idea after looking at the two files: 4) Have a single driver module, with the skx_common.c containing the top-level module_init/module_exit functions that call into the hardware specific versions. This is the opposite of the usual recommendation (the driver is already structured nicely otherwise), but it would be cheap way out here. Arnd
On Thu, Mar 14, 2019 at 08:09:06AM +0100, Arnd Bergmann wrote: > > So where should we go. Proposed solutions are piling up: > > > > 1) Make skx_common a module > > [downside: have to EXPORT everything in it] > > 2) Move module-ish bits out of skx_common > > [downside: perhaps fragile] > > 3) #include skx_common.c into {skx,i10nm}_edac.c > > [downside: no patch yet, bigger code size] > > Sorry to add on to the already long list, but one more idea after > looking at the two files: > > 4) Have a single driver module, with the skx_common.c containing > the top-level module_init/module_exit functions that call into the > hardware specific versions. > > This is the opposite of the usual recommendation (the driver > is already structured nicely otherwise), but it would be cheap > way out here. This is what I think right now: if this is going to become such a pain, then we better keep those two drivers completely separate. Who cares if there's some code duplication?! Better that than some obscure randconfig build breakages and fixes introducing more ugliness. IOW, we should keep it simple. Unless, Tony, you want to be able to make changes to the common code in one place and don't want to patch both. Then I'd librarize the common functionality, i.e., do something along the lines of 2). But having two completely separate drivers is the most robust variant, IMO. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Mar 14, 2019 at 12:04:13PM +0100, Borislav Petkov wrote: > On Thu, Mar 14, 2019 at 08:09:06AM +0100, Arnd Bergmann wrote: > > > So where should we go. Proposed solutions are piling up: > > > > > > 1) Make skx_common a module > > > [downside: have to EXPORT everything in it] > > > 2) Move module-ish bits out of skx_common > > > [downside: perhaps fragile] > > > 3) #include skx_common.c into {skx,i10nm}_edac.c > > > [downside: no patch yet, bigger code size] > > > > Sorry to add on to the already long list, but one more idea after > > looking at the two files: > > > > 4) Have a single driver module, with the skx_common.c containing > > the top-level module_init/module_exit functions that call into the > > hardware specific versions. > > > > This is the opposite of the usual recommendation (the driver > > is already structured nicely otherwise), but it would be cheap > > way out here. > > This is what I think right now: if this is going to become such a pain, > then we better keep those two drivers completely separate. Who cares if > there's some code duplication?! Better that than some obscure randconfig > build breakages and fixes introducing more ugliness. IOW, we should keep > it simple. > > Unless, Tony, you want to be able to make changes to the common code in > one place and don't want to patch both. Then I'd librarize the common > functionality, i.e., do something along the lines of 2). There's a lot of common code. Patching the same thing in two different files seems like make-work (and a recipe for things to be forgotten/missed). I made a patch based on option #3. Rough steps were: $ cat skx_common.c >> skx_common.h $ git rm skx_common.c $ git mv skx_base.c skx_edac.c $ git mv i10nm_base.c i10nm_edac.c $ edit Makefile for changes of names $ edit skx_common.h to add "static" everywhere $ edit to fix the build issues Not entirely happy with some of the bits in that last step. Patch looks like this: From b9e9723c95b667d28d81ea53b35447e1ce8f3244 Mon Sep 17 00:00:00 2001 From: Tony Luck <tony.luck@intel.com> Date: Thu, 14 Mar 2019 10:45:34 -0700 Subject: [PATCH] fix corner cases for randconfig --- drivers/edac/Makefile | 2 - drivers/edac/{i10nm_base.c => i10nm_edac.c} | 5 + drivers/edac/skx_common.c | 691 -------------------- drivers/edac/skx_common.h | 685 ++++++++++++++++++- drivers/edac/{skx_base.c => skx_edac.c} | 9 +- 5 files changed, 676 insertions(+), 716 deletions(-) rename drivers/edac/{i10nm_base.c => i10nm_edac.c} (98%) delete mode 100644 drivers/edac/skx_common.c rename drivers/edac/{skx_base.c => skx_edac.c} (98%) diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index 89ad4a84a0f6..5d8d88574d3c 100644 --- a/drivers/edac/Makefile +++ b/drivers/edac/Makefile @@ -57,10 +57,8 @@ obj-$(CONFIG_EDAC_MPC85XX) += mpc85xx_edac_mod.o layerscape_edac_mod-y := fsl_ddr_edac.o layerscape_edac.o obj-$(CONFIG_EDAC_LAYERSCAPE) += layerscape_edac_mod.o -skx_edac-y := skx_common.o skx_base.o obj-$(CONFIG_EDAC_SKX) += skx_edac.o -i10nm_edac-y := skx_common.o i10nm_base.o obj-$(CONFIG_EDAC_I10NM) += i10nm_edac.o obj-$(CONFIG_EDAC_MV64X60) += mv64x60_edac.o diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_edac.c similarity index 98% rename from drivers/edac/i10nm_base.c rename to drivers/edac/i10nm_edac.c index c334fb7c63df..45c6497f35c5 100644 --- a/drivers/edac/i10nm_base.c +++ b/drivers/edac/i10nm_edac.c @@ -10,6 +10,11 @@ #include <asm/intel-family.h> #include <asm/mce.h> #include "edac_module.h" + +struct decoded_addr; +typedef bool (*skx_decode_f)(struct decoded_addr *res); +static skx_decode_f skx_decode; + #include "skx_common.h" #define I10NM_REVISION "v0.0.3" diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c deleted file mode 100644 index 0e96e7b5b0a7..000000000000 --- a/drivers/edac/skx_common.c +++ /dev/null @@ -1,691 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Common codes for both the skx_edac driver and Intel 10nm server EDAC driver. - * Originally split out from the skx_edac driver. - * - * Copyright (c) 2018, Intel Corporation. - */ - -#include <linux/acpi.h> -#include <linux/dmi.h> -#include <linux/adxl.h> -#include <acpi/nfit.h> -#include <asm/mce.h> -#include "edac_module.h" -#include "skx_common.h" - -static const char * const component_names[] = { - [INDEX_SOCKET] = "ProcessorSocketId", - [INDEX_MEMCTRL] = "MemoryControllerId", - [INDEX_CHANNEL] = "ChannelId", - [INDEX_DIMM] = "DimmSlotId", -}; - -static int component_indices[ARRAY_SIZE(component_names)]; -static int adxl_component_count; -static const char * const *adxl_component_names; -static u64 *adxl_values; -static char *adxl_msg; - -static char skx_msg[MSG_SIZE]; -static skx_decode_f skx_decode; -static u64 skx_tolm, skx_tohm; -static LIST_HEAD(dev_edac_list); - -int __init skx_adxl_get(void) -{ - const char * const *names; - int i, j; - - names = adxl_get_component_names(); - if (!names) { - skx_printk(KERN_NOTICE, "No firmware support for address translation.\n"); - return -ENODEV; - } - - for (i = 0; i < INDEX_MAX; i++) { - for (j = 0; names[j]; j++) { - if (!strcmp(component_names[i], names[j])) { - component_indices[i] = j; - break; - } - } - - if (!names[j]) - goto err; - } - - adxl_component_names = names; - while (*names++) - adxl_component_count++; - - adxl_values = kcalloc(adxl_component_count, sizeof(*adxl_values), - GFP_KERNEL); - if (!adxl_values) { - adxl_component_count = 0; - return -ENOMEM; - } - - adxl_msg = kzalloc(MSG_SIZE, GFP_KERNEL); - if (!adxl_msg) { - adxl_component_count = 0; - kfree(adxl_values); - return -ENOMEM; - } - - return 0; -err: - skx_printk(KERN_ERR, "'%s' is not matched from DSM parameters: ", - component_names[i]); - for (j = 0; names[j]; j++) - skx_printk(KERN_CONT, "%s ", names[j]); - skx_printk(KERN_CONT, "\n"); - - return -ENODEV; -} - -void __exit skx_adxl_put(void) -{ - kfree(adxl_values); - kfree(adxl_msg); -} - -static bool skx_adxl_decode(struct decoded_addr *res) -{ - int i, len = 0; - - if (res->addr >= skx_tohm || (res->addr >= skx_tolm && - res->addr < BIT_ULL(32))) { - edac_dbg(0, "Address 0x%llx out of range\n", res->addr); - return false; - } - - if (adxl_decode(res->addr, adxl_values)) { - edac_dbg(0, "Failed to decode 0x%llx\n", res->addr); - return false; - } - - res->socket = (int)adxl_values[component_indices[INDEX_SOCKET]]; - res->imc = (int)adxl_values[component_indices[INDEX_MEMCTRL]]; - res->channel = (int)adxl_values[component_indices[INDEX_CHANNEL]]; - res->dimm = (int)adxl_values[component_indices[INDEX_DIMM]]; - - for (i = 0; i < adxl_component_count; i++) { - if (adxl_values[i] == ~0x0ull) - continue; - - len += snprintf(adxl_msg + len, MSG_SIZE - len, " %s:0x%llx", - adxl_component_names[i], adxl_values[i]); - if (MSG_SIZE - len <= 0) - break; - } - - return true; -} - -void skx_set_decode(skx_decode_f decode) -{ - skx_decode = decode; -} - -int skx_get_src_id(struct skx_dev *d, u8 *id) -{ - u32 reg; - - if (pci_read_config_dword(d->util_all, 0xf0, ®)) { - skx_printk(KERN_ERR, "Failed to read src id\n"); - return -ENODEV; - } - - *id = GET_BITFIELD(reg, 12, 14); - return 0; -} - -int skx_get_node_id(struct skx_dev *d, u8 *id) -{ - u32 reg; - - if (pci_read_config_dword(d->util_all, 0xf4, ®)) { - skx_printk(KERN_ERR, "Failed to read node id\n"); - return -ENODEV; - } - - *id = GET_BITFIELD(reg, 0, 2); - return 0; -} - -static int get_width(u32 mtr) -{ - switch (GET_BITFIELD(mtr, 8, 9)) { - case 0: - return DEV_X4; - case 1: - return DEV_X8; - case 2: - return DEV_X16; - } - return DEV_UNKNOWN; -} - -/* - * We use the per-socket device @did to count how many sockets are present, - * and to detemine which PCI buses are associated with each socket. Allocate - * and build the full list of all the skx_dev structures that we need here. - */ -int skx_get_all_bus_mappings(unsigned int did, int off, enum type type, - struct list_head **list) -{ - struct pci_dev *pdev, *prev; - struct skx_dev *d; - u32 reg; - int ndev = 0; - - prev = NULL; - for (;;) { - pdev = pci_get_device(PCI_VENDOR_ID_INTEL, did, prev); - if (!pdev) - break; - ndev++; - d = kzalloc(sizeof(*d), GFP_KERNEL); - if (!d) { - pci_dev_put(pdev); - return -ENOMEM; - } - - if (pci_read_config_dword(pdev, off, ®)) { - kfree(d); - pci_dev_put(pdev); - skx_printk(KERN_ERR, "Failed to read bus idx\n"); - return -ENODEV; - } - - d->bus[0] = GET_BITFIELD(reg, 0, 7); - d->bus[1] = GET_BITFIELD(reg, 8, 15); - if (type == SKX) { - d->seg = pci_domain_nr(pdev->bus); - d->bus[2] = GET_BITFIELD(reg, 16, 23); - d->bus[3] = GET_BITFIELD(reg, 24, 31); - } else { - d->seg = GET_BITFIELD(reg, 16, 23); - } - - edac_dbg(2, "busses: 0x%x, 0x%x, 0x%x, 0x%x\n", - d->bus[0], d->bus[1], d->bus[2], d->bus[3]); - list_add_tail(&d->list, &dev_edac_list); - prev = pdev; - } - - if (list) - *list = &dev_edac_list; - return ndev; -} - -int skx_get_hi_lo(unsigned int did, int off[], u64 *tolm, u64 *tohm) -{ - struct pci_dev *pdev; - u32 reg; - - pdev = pci_get_device(PCI_VENDOR_ID_INTEL, did, NULL); - if (!pdev) { - skx_printk(KERN_ERR, "Can't get tolm/tohm\n"); - return -ENODEV; - } - - if (pci_read_config_dword(pdev, off[0], ®)) { - skx_printk(KERN_ERR, "Failed to read tolm\n"); - goto fail; - } - skx_tolm = reg; - - if (pci_read_config_dword(pdev, off[1], ®)) { - skx_printk(KERN_ERR, "Failed to read lower tohm\n"); - goto fail; - } - skx_tohm = reg; - - if (pci_read_config_dword(pdev, off[2], ®)) { - skx_printk(KERN_ERR, "Failed to read upper tohm\n"); - goto fail; - } - skx_tohm |= (u64)reg << 32; - - pci_dev_put(pdev); - *tolm = skx_tolm; - *tohm = skx_tohm; - edac_dbg(2, "tolm = 0x%llx tohm = 0x%llx\n", skx_tolm, skx_tohm); - return 0; -fail: - pci_dev_put(pdev); - return -ENODEV; -} - -static int skx_get_dimm_attr(u32 reg, int lobit, int hibit, int add, - int minval, int maxval, const char *name) -{ - u32 val = GET_BITFIELD(reg, lobit, hibit); - - if (val < minval || val > maxval) { - edac_dbg(2, "bad %s = %d (raw=0x%x)\n", name, val, reg); - return -EINVAL; - } - return val + add; -} - -#define numrank(reg) skx_get_dimm_attr(reg, 12, 13, 0, 0, 2, "ranks") -#define numrow(reg) skx_get_dimm_attr(reg, 2, 4, 12, 1, 6, "rows") -#define numcol(reg) skx_get_dimm_attr(reg, 0, 1, 10, 0, 2, "cols") - -int skx_get_dimm_info(u32 mtr, u32 amap, struct dimm_info *dimm, - struct skx_imc *imc, int chan, int dimmno) -{ - int banks = 16, ranks, rows, cols, npages; - u64 size; - - ranks = numrank(mtr); - rows = numrow(mtr); - cols = numcol(mtr); - - /* - * Compute size in 8-byte (2^3) words, then shift to MiB (2^20) - */ - size = ((1ull << (rows + cols + ranks)) * banks) >> (20 - 3); - npages = MiB_TO_PAGES(size); - - edac_dbg(0, "mc#%d: channel %d, dimm %d, %lld MiB (%d pages) bank: %d, rank: %d, row: 0x%x, col: 0x%x\n", - imc->mc, chan, dimmno, size, npages, - banks, 1 << ranks, rows, cols); - - imc->chan[chan].dimms[dimmno].close_pg = GET_BITFIELD(mtr, 0, 0); - imc->chan[chan].dimms[dimmno].bank_xor_enable = GET_BITFIELD(mtr, 9, 9); - imc->chan[chan].dimms[dimmno].fine_grain_bank = GET_BITFIELD(amap, 0, 0); - imc->chan[chan].dimms[dimmno].rowbits = rows; - imc->chan[chan].dimms[dimmno].colbits = cols; - - dimm->nr_pages = npages; - dimm->grain = 32; - dimm->dtype = get_width(mtr); - dimm->mtype = MEM_DDR4; - dimm->edac_mode = EDAC_SECDED; /* likely better than this */ - snprintf(dimm->label, sizeof(dimm->label), "CPU_SrcID#%u_MC#%u_Chan#%u_DIMM#%u", - imc->src_id, imc->lmc, chan, dimmno); - - return 1; -} - -int skx_get_nvdimm_info(struct dimm_info *dimm, struct skx_imc *imc, - int chan, int dimmno, const char *mod_str) -{ - int smbios_handle; - u32 dev_handle; - u16 flags; - u64 size = 0; - - dev_handle = ACPI_NFIT_BUILD_DEVICE_HANDLE(dimmno, chan, imc->lmc, - imc->src_id, 0); - - smbios_handle = nfit_get_smbios_id(dev_handle, &flags); - if (smbios_handle == -EOPNOTSUPP) { - pr_warn_once("%s: Can't find size of NVDIMM. Try enabling CONFIG_ACPI_NFIT\n", mod_str); - goto unknown_size; - } - - if (smbios_handle < 0) { - skx_printk(KERN_ERR, "Can't find handle for NVDIMM ADR=0x%x\n", dev_handle); - goto unknown_size; - } - - if (flags & ACPI_NFIT_MEM_MAP_FAILED) { - skx_printk(KERN_ERR, "NVDIMM ADR=0x%x is not mapped\n", dev_handle); - goto unknown_size; - } - - size = dmi_memdev_size(smbios_handle); - if (size == ~0ull) - skx_printk(KERN_ERR, "Can't find size for NVDIMM ADR=0x%x/SMBIOS=0x%x\n", - dev_handle, smbios_handle); - -unknown_size: - dimm->nr_pages = size >> PAGE_SHIFT; - dimm->grain = 32; - dimm->dtype = DEV_UNKNOWN; - dimm->mtype = MEM_NVDIMM; - dimm->edac_mode = EDAC_SECDED; /* likely better than this */ - - edac_dbg(0, "mc#%d: channel %d, dimm %d, %llu MiB (%u pages)\n", - imc->mc, chan, dimmno, size >> 20, dimm->nr_pages); - - snprintf(dimm->label, sizeof(dimm->label), "CPU_SrcID#%u_MC#%u_Chan#%u_DIMM#%u", - imc->src_id, imc->lmc, chan, dimmno); - - return (size == 0 || size == ~0ull) ? 0 : 1; -} - -int skx_register_mci(struct skx_imc *imc, struct pci_dev *pdev, - const char *ctl_name, const char *mod_str, - get_dimm_config_f get_dimm_config) -{ - struct mem_ctl_info *mci; - struct edac_mc_layer layers[2]; - struct skx_pvt *pvt; - int rc; - - /* Allocate a new MC control structure */ - layers[0].type = EDAC_MC_LAYER_CHANNEL; - layers[0].size = NUM_CHANNELS; - layers[0].is_virt_csrow = false; - layers[1].type = EDAC_MC_LAYER_SLOT; - layers[1].size = NUM_DIMMS; - layers[1].is_virt_csrow = true; - mci = edac_mc_alloc(imc->mc, ARRAY_SIZE(layers), layers, - sizeof(struct skx_pvt)); - - if (unlikely(!mci)) - return -ENOMEM; - - edac_dbg(0, "MC#%d: mci = %p\n", imc->mc, mci); - - /* Associate skx_dev and mci for future usage */ - imc->mci = mci; - pvt = mci->pvt_info; - pvt->imc = imc; - - mci->ctl_name = kasprintf(GFP_KERNEL, "%s#%d IMC#%d", ctl_name, - imc->node_id, imc->lmc); - if (!mci->ctl_name) { - rc = -ENOMEM; - goto fail0; - } - - mci->mtype_cap = MEM_FLAG_DDR4 | MEM_FLAG_NVDIMM; - mci->edac_ctl_cap = EDAC_FLAG_NONE; - mci->edac_cap = EDAC_FLAG_NONE; - mci->mod_name = mod_str; - mci->dev_name = pci_name(pdev); - mci->ctl_page_to_phys = NULL; - - rc = get_dimm_config(mci); - if (rc < 0) - goto fail; - - /* Record ptr to the generic device */ - mci->pdev = &pdev->dev; - - /* Add this new MC control structure to EDAC's list of MCs */ - if (unlikely(edac_mc_add_mc(mci))) { - edac_dbg(0, "MC: failed edac_mc_add_mc()\n"); - rc = -EINVAL; - goto fail; - } - - return 0; - -fail: - kfree(mci->ctl_name); -fail0: - edac_mc_free(mci); - imc->mci = NULL; - return rc; -} - -static void skx_unregister_mci(struct skx_imc *imc) -{ - struct mem_ctl_info *mci = imc->mci; - - if (!mci) - return; - - edac_dbg(0, "MC%d: mci = %p\n", imc->mc, mci); - - /* Remove MC sysfs nodes */ - edac_mc_del_mc(mci->pdev); - - edac_dbg(1, "%s: free mci struct\n", mci->ctl_name); - kfree(mci->ctl_name); - edac_mc_free(mci); -} - -static struct mem_ctl_info *get_mci(int src_id, int lmc) -{ - struct skx_dev *d; - - if (lmc > NUM_IMC - 1) { - skx_printk(KERN_ERR, "Bad lmc %d\n", lmc); - return NULL; - } - - list_for_each_entry(d, &dev_edac_list, list) { - if (d->imc[0].src_id == src_id) - return d->imc[lmc].mci; - } - - skx_printk(KERN_ERR, "No mci for src_id %d lmc %d\n", src_id, lmc); - return NULL; -} - -static void skx_mce_output_error(struct mem_ctl_info *mci, - const struct mce *m, - struct decoded_addr *res) -{ - enum hw_event_mc_err_type tp_event; - char *type, *optype; - bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0); - bool overflow = GET_BITFIELD(m->status, 62, 62); - bool uncorrected_error = GET_BITFIELD(m->status, 61, 61); - bool recoverable; - u32 core_err_cnt = GET_BITFIELD(m->status, 38, 52); - u32 mscod = GET_BITFIELD(m->status, 16, 31); - u32 errcode = GET_BITFIELD(m->status, 0, 15); - u32 optypenum = GET_BITFIELD(m->status, 4, 6); - - recoverable = GET_BITFIELD(m->status, 56, 56); - - if (uncorrected_error) { - core_err_cnt = 1; - if (ripv) { - type = "FATAL"; - tp_event = HW_EVENT_ERR_FATAL; - } else { - type = "NON_FATAL"; - tp_event = HW_EVENT_ERR_UNCORRECTED; - } - } else { - type = "CORRECTED"; - tp_event = HW_EVENT_ERR_CORRECTED; - } - - /* - * According to Intel Architecture spec vol 3B, - * Table 15-10 "IA32_MCi_Status [15:0] Compound Error Code Encoding" - * memory errors should fit one of these masks: - * 000f 0000 1mmm cccc (binary) - * 000f 0010 1mmm cccc (binary) [RAM used as cache] - * where: - * f = Correction Report Filtering Bit. If 1, subsequent errors - * won't be shown - * mmm = error type - * cccc = channel - * If the mask doesn't match, report an error to the parsing logic - */ - if (!((errcode & 0xef80) == 0x80 || (errcode & 0xef80) == 0x280)) { - optype = "Can't parse: it is not a mem"; - } else { - switch (optypenum) { - case 0: - optype = "generic undef request error"; - break; - case 1: - optype = "memory read error"; - break; - case 2: - optype = "memory write error"; - break; - case 3: - optype = "addr/cmd error"; - break; - case 4: - optype = "memory scrubbing error"; - break; - default: - optype = "reserved"; - break; - } - } - if (adxl_component_count) { - snprintf(skx_msg, MSG_SIZE, "%s%s err_code:0x%04x:0x%04x %s", - overflow ? " OVERFLOW" : "", - (uncorrected_error && recoverable) ? " recoverable" : "", - mscod, errcode, adxl_msg); - } else { - snprintf(skx_msg, MSG_SIZE, - "%s%s err_code:0x%04x:0x%04x socket:%d imc:%d rank:%d bg:%d ba:%d row:0x%x col:0x%x", - overflow ? " OVERFLOW" : "", - (uncorrected_error && recoverable) ? " recoverable" : "", - mscod, errcode, - res->socket, res->imc, res->rank, - res->bank_group, res->bank_address, res->row, res->column); - } - - edac_dbg(0, "%s\n", skx_msg); - - /* Call the helper to output message */ - edac_mc_handle_error(tp_event, mci, core_err_cnt, - m->addr >> PAGE_SHIFT, m->addr & ~PAGE_MASK, 0, - res->channel, res->dimm, -1, - optype, skx_msg); -} - -int skx_mce_check_error(struct notifier_block *nb, unsigned long val, - void *data) -{ - struct mce *mce = (struct mce *)data; - struct decoded_addr res; - struct mem_ctl_info *mci; - char *type; - - if (edac_get_report_status() == EDAC_REPORTING_DISABLED) - return NOTIFY_DONE; - - /* ignore unless this is memory related with an address */ - if ((mce->status & 0xefff) >> 7 != 1 || !(mce->status & MCI_STATUS_ADDRV)) - return NOTIFY_DONE; - - memset(&res, 0, sizeof(res)); - res.addr = mce->addr; - - if (adxl_component_count) { - if (!skx_adxl_decode(&res)) - return NOTIFY_DONE; - - mci = get_mci(res.socket, res.imc); - } else { - if (!skx_decode || !skx_decode(&res)) - return NOTIFY_DONE; - - mci = res.dev->imc[res.imc].mci; - } - - if (!mci) - return NOTIFY_DONE; - - if (mce->mcgstatus & MCG_STATUS_MCIP) - type = "Exception"; - else - type = "Event"; - - skx_mc_printk(mci, KERN_DEBUG, "HANDLING MCE MEMORY ERROR\n"); - - skx_mc_printk(mci, KERN_DEBUG, "CPU %d: Machine Check %s: 0x%llx " - "Bank %d: 0x%llx\n", mce->extcpu, type, - mce->mcgstatus, mce->bank, mce->status); - skx_mc_printk(mci, KERN_DEBUG, "TSC 0x%llx ", mce->tsc); - skx_mc_printk(mci, KERN_DEBUG, "ADDR 0x%llx ", mce->addr); - skx_mc_printk(mci, KERN_DEBUG, "MISC 0x%llx ", mce->misc); - - skx_mc_printk(mci, KERN_DEBUG, "PROCESSOR %u:0x%x TIME %llu SOCKET " - "%u APIC 0x%x\n", mce->cpuvendor, mce->cpuid, - mce->time, mce->socketid, mce->apicid); - - skx_mce_output_error(mci, mce, &res); - - return NOTIFY_DONE; -} - -void skx_remove(void) -{ - int i, j; - struct skx_dev *d, *tmp; - - edac_dbg(0, "\n"); - - list_for_each_entry_safe(d, tmp, &dev_edac_list, list) { - list_del(&d->list); - for (i = 0; i < NUM_IMC; i++) { - if (d->imc[i].mci) - skx_unregister_mci(&d->imc[i]); - - if (d->imc[i].mdev) - pci_dev_put(d->imc[i].mdev); - - if (d->imc[i].mbase) - iounmap(d->imc[i].mbase); - - for (j = 0; j < NUM_CHANNELS; j++) { - if (d->imc[i].chan[j].cdev) - pci_dev_put(d->imc[i].chan[j].cdev); - } - } - if (d->util_all) - pci_dev_put(d->util_all); - if (d->sad_all) - pci_dev_put(d->sad_all); - if (d->uracu) - pci_dev_put(d->uracu); - - kfree(d); - } -} - -#ifdef CONFIG_EDAC_DEBUG -/* - * Debug feature. - * Exercise the address decode logic by writing an address to - * /sys/kernel/debug/edac/dirname/addr. - */ -static struct dentry *skx_test; - -static int debugfs_u64_set(void *data, u64 val) -{ - struct mce m; - - pr_warn_once("Fake error to 0x%llx injected via debugfs\n", val); - - memset(&m, 0, sizeof(m)); - /* ADDRV + MemRd + Unknown channel */ - m.status = MCI_STATUS_ADDRV + 0x90; - /* One corrected error */ - m.status |= BIT_ULL(MCI_STATUS_CEC_SHIFT); - m.addr = val; - skx_mce_check_error(NULL, 0, &m); - - return 0; -} -DEFINE_SIMPLE_ATTRIBUTE(fops_u64_wo, NULL, debugfs_u64_set, "%llu\n"); - -void setup_skx_debug(const char *dirname) -{ - skx_test = edac_debugfs_create_dir(dirname); - if (!skx_test) - return; - - if (!edac_debugfs_create_file("addr", 0200, skx_test, - NULL, &fops_u64_wo)) { - debugfs_remove(skx_test); - skx_test = NULL; - } -} - -void teardown_skx_debug(void) -{ - debugfs_remove_recursive(skx_test); -} -#endif /*CONFIG_EDAC_DEBUG*/ diff --git a/drivers/edac/skx_common.h b/drivers/edac/skx_common.h index d25374e34d4f..6da3def09ea1 100644 --- a/drivers/edac/skx_common.h +++ b/drivers/edac/skx_common.h @@ -9,6 +9,13 @@ #ifndef _SKX_COMM_EDAC_H #define _SKX_COMM_EDAC_H +#include <linux/acpi.h> +#include <linux/dmi.h> +#include <linux/adxl.h> +#include <acpi/nfit.h> +#include <asm/mce.h> +#include "edac_module.h" + #define MSG_SIZE 1024 /* @@ -112,41 +119,677 @@ struct decoded_addr { }; typedef int (*get_dimm_config_f)(struct mem_ctl_info *mci); -typedef bool (*skx_decode_f)(struct decoded_addr *res); -int __init skx_adxl_get(void); -void __exit skx_adxl_put(void); -void skx_set_decode(skx_decode_f decode); +static const char * const component_names[] = { + [INDEX_SOCKET] = "ProcessorSocketId", + [INDEX_MEMCTRL] = "MemoryControllerId", + [INDEX_CHANNEL] = "ChannelId", + [INDEX_DIMM] = "DimmSlotId", +}; + +static int component_indices[ARRAY_SIZE(component_names)]; +static int adxl_component_count; +static const char * const *adxl_component_names; +static u64 *adxl_values; +static char *adxl_msg; + +static char skx_msg[MSG_SIZE]; +static u64 skx_tolm, skx_tohm; +static LIST_HEAD(dev_edac_list); + +static int __init skx_adxl_get(void) +{ + const char * const *names; + int i, j; + + names = adxl_get_component_names(); + if (!names) { + skx_printk(KERN_NOTICE, "No firmware support for address translation.\n"); + return -ENODEV; + } + + for (i = 0; i < INDEX_MAX; i++) { + for (j = 0; names[j]; j++) { + if (!strcmp(component_names[i], names[j])) { + component_indices[i] = j; + break; + } + } + + if (!names[j]) + goto err; + } + + adxl_component_names = names; + while (*names++) + adxl_component_count++; + + adxl_values = kcalloc(adxl_component_count, sizeof(*adxl_values), + GFP_KERNEL); + if (!adxl_values) { + adxl_component_count = 0; + return -ENOMEM; + } + + adxl_msg = kzalloc(MSG_SIZE, GFP_KERNEL); + if (!adxl_msg) { + adxl_component_count = 0; + kfree(adxl_values); + return -ENOMEM; + } + + return 0; +err: + skx_printk(KERN_ERR, "'%s' is not matched from DSM parameters: ", + component_names[i]); + for (j = 0; names[j]; j++) + skx_printk(KERN_CONT, "%s ", names[j]); + skx_printk(KERN_CONT, "\n"); + + return -ENODEV; +} + +static void __exit skx_adxl_put(void) +{ + kfree(adxl_values); + kfree(adxl_msg); +} + +static bool skx_adxl_decode(struct decoded_addr *res) +{ + int i, len = 0; + + if (res->addr >= skx_tohm || (res->addr >= skx_tolm && + res->addr < BIT_ULL(32))) { + edac_dbg(0, "Address 0x%llx out of range\n", res->addr); + return false; + } + + if (adxl_decode(res->addr, adxl_values)) { + edac_dbg(0, "Failed to decode 0x%llx\n", res->addr); + return false; + } + + res->socket = (int)adxl_values[component_indices[INDEX_SOCKET]]; + res->imc = (int)adxl_values[component_indices[INDEX_MEMCTRL]]; + res->channel = (int)adxl_values[component_indices[INDEX_CHANNEL]]; + res->dimm = (int)adxl_values[component_indices[INDEX_DIMM]]; + + for (i = 0; i < adxl_component_count; i++) { + if (adxl_values[i] == ~0x0ull) + continue; + + len += snprintf(adxl_msg + len, MSG_SIZE - len, " %s:0x%llx", + adxl_component_names[i], adxl_values[i]); + if (MSG_SIZE - len <= 0) + break; + } + + return true; +} + +static int skx_get_src_id(struct skx_dev *d, u8 *id) +{ + u32 reg; + + if (pci_read_config_dword(d->util_all, 0xf0, ®)) { + skx_printk(KERN_ERR, "Failed to read src id\n"); + return -ENODEV; + } + + *id = GET_BITFIELD(reg, 12, 14); + return 0; +} + +static int skx_get_node_id(struct skx_dev *d, u8 *id) +{ + u32 reg; + + if (pci_read_config_dword(d->util_all, 0xf4, ®)) { + skx_printk(KERN_ERR, "Failed to read node id\n"); + return -ENODEV; + } + + *id = GET_BITFIELD(reg, 0, 2); + return 0; +} + +static int get_width(u32 mtr) +{ + switch (GET_BITFIELD(mtr, 8, 9)) { + case 0: + return DEV_X4; + case 1: + return DEV_X8; + case 2: + return DEV_X16; + } + return DEV_UNKNOWN; +} + +/* + * We use the per-socket device @did to count how many sockets are present, + * and to detemine which PCI buses are associated with each socket. Allocate + * and build the full list of all the skx_dev structures that we need here. + */ +static int skx_get_all_bus_mappings(unsigned int did, int off, enum type type, + struct list_head **list) +{ + struct pci_dev *pdev, *prev; + struct skx_dev *d; + u32 reg; + int ndev = 0; + + prev = NULL; + for (;;) { + pdev = pci_get_device(PCI_VENDOR_ID_INTEL, did, prev); + if (!pdev) + break; + ndev++; + d = kzalloc(sizeof(*d), GFP_KERNEL); + if (!d) { + pci_dev_put(pdev); + return -ENOMEM; + } + + if (pci_read_config_dword(pdev, off, ®)) { + kfree(d); + pci_dev_put(pdev); + skx_printk(KERN_ERR, "Failed to read bus idx\n"); + return -ENODEV; + } + + d->bus[0] = GET_BITFIELD(reg, 0, 7); + d->bus[1] = GET_BITFIELD(reg, 8, 15); + if (type == SKX) { + d->seg = pci_domain_nr(pdev->bus); + d->bus[2] = GET_BITFIELD(reg, 16, 23); + d->bus[3] = GET_BITFIELD(reg, 24, 31); + } else { + d->seg = GET_BITFIELD(reg, 16, 23); + } + + edac_dbg(2, "busses: 0x%x, 0x%x, 0x%x, 0x%x\n", + d->bus[0], d->bus[1], d->bus[2], d->bus[3]); + list_add_tail(&d->list, &dev_edac_list); + prev = pdev; + } + + if (list) + *list = &dev_edac_list; + return ndev; +} + +static int skx_get_hi_lo(unsigned int did, int off[], u64 *tolm, u64 *tohm) +{ + struct pci_dev *pdev; + u32 reg; + + pdev = pci_get_device(PCI_VENDOR_ID_INTEL, did, NULL); + if (!pdev) { + skx_printk(KERN_ERR, "Can't get tolm/tohm\n"); + return -ENODEV; + } + + if (pci_read_config_dword(pdev, off[0], ®)) { + skx_printk(KERN_ERR, "Failed to read tolm\n"); + goto fail; + } + skx_tolm = reg; + + if (pci_read_config_dword(pdev, off[1], ®)) { + skx_printk(KERN_ERR, "Failed to read lower tohm\n"); + goto fail; + } + skx_tohm = reg; + + if (pci_read_config_dword(pdev, off[2], ®)) { + skx_printk(KERN_ERR, "Failed to read upper tohm\n"); + goto fail; + } + skx_tohm |= (u64)reg << 32; + + pci_dev_put(pdev); + *tolm = skx_tolm; + *tohm = skx_tohm; + edac_dbg(2, "tolm = 0x%llx tohm = 0x%llx\n", skx_tolm, skx_tohm); + return 0; +fail: + pci_dev_put(pdev); + return -ENODEV; +} + +static int skx_get_dimm_attr(u32 reg, int lobit, int hibit, int add, + int minval, int maxval, const char *name) +{ + u32 val = GET_BITFIELD(reg, lobit, hibit); + + if (val < minval || val > maxval) { + edac_dbg(2, "bad %s = %d (raw=0x%x)\n", name, val, reg); + return -EINVAL; + } + return val + add; +} + +#define numrank(reg) skx_get_dimm_attr(reg, 12, 13, 0, 0, 2, "ranks") +#define numrow(reg) skx_get_dimm_attr(reg, 2, 4, 12, 1, 6, "rows") +#define numcol(reg) skx_get_dimm_attr(reg, 0, 1, 10, 0, 2, "cols") + +static int skx_get_dimm_info(u32 mtr, u32 amap, struct dimm_info *dimm, + struct skx_imc *imc, int chan, int dimmno) +{ + int banks = 16, ranks, rows, cols, npages; + u64 size; + + ranks = numrank(mtr); + rows = numrow(mtr); + cols = numcol(mtr); + + /* + * Compute size in 8-byte (2^3) words, then shift to MiB (2^20) + */ + size = ((1ull << (rows + cols + ranks)) * banks) >> (20 - 3); + npages = MiB_TO_PAGES(size); + + edac_dbg(0, "mc#%d: channel %d, dimm %d, %lld MiB (%d pages) bank: %d, rank: %d, row: 0x%x, col: 0x%x\n", + imc->mc, chan, dimmno, size, npages, + banks, 1 << ranks, rows, cols); + + imc->chan[chan].dimms[dimmno].close_pg = GET_BITFIELD(mtr, 0, 0); + imc->chan[chan].dimms[dimmno].bank_xor_enable = GET_BITFIELD(mtr, 9, 9); + imc->chan[chan].dimms[dimmno].fine_grain_bank = GET_BITFIELD(amap, 0, 0); + imc->chan[chan].dimms[dimmno].rowbits = rows; + imc->chan[chan].dimms[dimmno].colbits = cols; + + dimm->nr_pages = npages; + dimm->grain = 32; + dimm->dtype = get_width(mtr); + dimm->mtype = MEM_DDR4; + dimm->edac_mode = EDAC_SECDED; /* likely better than this */ + snprintf(dimm->label, sizeof(dimm->label), "CPU_SrcID#%u_MC#%u_Chan#%u_DIMM#%u", + imc->src_id, imc->lmc, chan, dimmno); + + return 1; +} + +static int skx_get_nvdimm_info(struct dimm_info *dimm, struct skx_imc *imc, + int chan, int dimmno, const char *mod_str) +{ + int smbios_handle; + u32 dev_handle; + u16 flags; + u64 size = 0; + + dev_handle = ACPI_NFIT_BUILD_DEVICE_HANDLE(dimmno, chan, imc->lmc, + imc->src_id, 0); + + smbios_handle = nfit_get_smbios_id(dev_handle, &flags); + if (smbios_handle == -EOPNOTSUPP) { + pr_warn_once("%s: Can't find size of NVDIMM. Try enabling CONFIG_ACPI_NFIT\n", mod_str); + goto unknown_size; + } + + if (smbios_handle < 0) { + skx_printk(KERN_ERR, "Can't find handle for NVDIMM ADR=0x%x\n", dev_handle); + goto unknown_size; + } -int skx_get_src_id(struct skx_dev *d, u8 *id); -int skx_get_node_id(struct skx_dev *d, u8 *id); + if (flags & ACPI_NFIT_MEM_MAP_FAILED) { + skx_printk(KERN_ERR, "NVDIMM ADR=0x%x is not mapped\n", dev_handle); + goto unknown_size; + } -int skx_get_all_bus_mappings(unsigned int did, int off, enum type, - struct list_head **list); + size = dmi_memdev_size(smbios_handle); + if (size == ~0ull) + skx_printk(KERN_ERR, "Can't find size for NVDIMM ADR=0x%x/SMBIOS=0x%x\n", + dev_handle, smbios_handle); -int skx_get_hi_lo(unsigned int did, int off[], u64 *tolm, u64 *tohm); +unknown_size: + dimm->nr_pages = size >> PAGE_SHIFT; + dimm->grain = 32; + dimm->dtype = DEV_UNKNOWN; + dimm->mtype = MEM_NVDIMM; + dimm->edac_mode = EDAC_SECDED; /* likely better than this */ -int skx_get_dimm_info(u32 mtr, u32 amap, struct dimm_info *dimm, - struct skx_imc *imc, int chan, int dimmno); + edac_dbg(0, "mc#%d: channel %d, dimm %d, %llu MiB (%u pages)\n", + imc->mc, chan, dimmno, size >> 20, dimm->nr_pages); -int skx_get_nvdimm_info(struct dimm_info *dimm, struct skx_imc *imc, - int chan, int dimmno, const char *mod_str); + snprintf(dimm->label, sizeof(dimm->label), "CPU_SrcID#%u_MC#%u_Chan#%u_DIMM#%u", + imc->src_id, imc->lmc, chan, dimmno); -int skx_register_mci(struct skx_imc *imc, struct pci_dev *pdev, + return (size == 0 || size == ~0ull) ? 0 : 1; +} + +static int skx_register_mci(struct skx_imc *imc, struct pci_dev *pdev, const char *ctl_name, const char *mod_str, - get_dimm_config_f get_dimm_config); + get_dimm_config_f get_dimm_config) +{ + struct mem_ctl_info *mci; + struct edac_mc_layer layers[2]; + struct skx_pvt *pvt; + int rc; + + /* Allocate a new MC control structure */ + layers[0].type = EDAC_MC_LAYER_CHANNEL; + layers[0].size = NUM_CHANNELS; + layers[0].is_virt_csrow = false; + layers[1].type = EDAC_MC_LAYER_SLOT; + layers[1].size = NUM_DIMMS; + layers[1].is_virt_csrow = true; + mci = edac_mc_alloc(imc->mc, ARRAY_SIZE(layers), layers, + sizeof(struct skx_pvt)); + + if (unlikely(!mci)) + return -ENOMEM; + + edac_dbg(0, "MC#%d: mci = %p\n", imc->mc, mci); + + /* Associate skx_dev and mci for future usage */ + imc->mci = mci; + pvt = mci->pvt_info; + pvt->imc = imc; + + mci->ctl_name = kasprintf(GFP_KERNEL, "%s#%d IMC#%d", ctl_name, + imc->node_id, imc->lmc); + if (!mci->ctl_name) { + rc = -ENOMEM; + goto fail0; + } + + mci->mtype_cap = MEM_FLAG_DDR4 | MEM_FLAG_NVDIMM; + mci->edac_ctl_cap = EDAC_FLAG_NONE; + mci->edac_cap = EDAC_FLAG_NONE; + mci->mod_name = mod_str; + mci->dev_name = pci_name(pdev); + mci->ctl_page_to_phys = NULL; + + rc = get_dimm_config(mci); + if (rc < 0) + goto fail; + + /* Record ptr to the generic device */ + mci->pdev = &pdev->dev; + + /* Add this new MC control structure to EDAC's list of MCs */ + if (unlikely(edac_mc_add_mc(mci))) { + edac_dbg(0, "MC: failed edac_mc_add_mc()\n"); + rc = -EINVAL; + goto fail; + } + + return 0; + +fail: + kfree(mci->ctl_name); +fail0: + edac_mc_free(mci); + imc->mci = NULL; + return rc; +} + +static void skx_unregister_mci(struct skx_imc *imc) +{ + struct mem_ctl_info *mci = imc->mci; + + if (!mci) + return; + + edac_dbg(0, "MC%d: mci = %p\n", imc->mc, mci); + + /* Remove MC sysfs nodes */ + edac_mc_del_mc(mci->pdev); + + edac_dbg(1, "%s: free mci struct\n", mci->ctl_name); + kfree(mci->ctl_name); + edac_mc_free(mci); +} + +static struct mem_ctl_info *get_mci(int src_id, int lmc) +{ + struct skx_dev *d; + + if (lmc > NUM_IMC - 1) { + skx_printk(KERN_ERR, "Bad lmc %d\n", lmc); + return NULL; + } + + list_for_each_entry(d, &dev_edac_list, list) { + if (d->imc[0].src_id == src_id) + return d->imc[lmc].mci; + } + + skx_printk(KERN_ERR, "No mci for src_id %d lmc %d\n", src_id, lmc); + return NULL; +} + +static void skx_mce_output_error(struct mem_ctl_info *mci, + const struct mce *m, + struct decoded_addr *res) +{ + enum hw_event_mc_err_type tp_event; + char *type, *optype; + bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0); + bool overflow = GET_BITFIELD(m->status, 62, 62); + bool uncorrected_error = GET_BITFIELD(m->status, 61, 61); + bool recoverable; + u32 core_err_cnt = GET_BITFIELD(m->status, 38, 52); + u32 mscod = GET_BITFIELD(m->status, 16, 31); + u32 errcode = GET_BITFIELD(m->status, 0, 15); + u32 optypenum = GET_BITFIELD(m->status, 4, 6); + + recoverable = GET_BITFIELD(m->status, 56, 56); + + if (uncorrected_error) { + core_err_cnt = 1; + if (ripv) { + type = "FATAL"; + tp_event = HW_EVENT_ERR_FATAL; + } else { + type = "NON_FATAL"; + tp_event = HW_EVENT_ERR_UNCORRECTED; + } + } else { + type = "CORRECTED"; + tp_event = HW_EVENT_ERR_CORRECTED; + } + + /* + * According to Intel Architecture spec vol 3B, + * Table 15-10 "IA32_MCi_Status [15:0] Compound Error Code Encoding" + * memory errors should fit one of these masks: + * 000f 0000 1mmm cccc (binary) + * 000f 0010 1mmm cccc (binary) [RAM used as cache] + * where: + * f = Correction Report Filtering Bit. If 1, subsequent errors + * won't be shown + * mmm = error type + * cccc = channel + * If the mask doesn't match, report an error to the parsing logic + */ + if (!((errcode & 0xef80) == 0x80 || (errcode & 0xef80) == 0x280)) { + optype = "Can't parse: it is not a mem"; + } else { + switch (optypenum) { + case 0: + optype = "generic undef request error"; + break; + case 1: + optype = "memory read error"; + break; + case 2: + optype = "memory write error"; + break; + case 3: + optype = "addr/cmd error"; + break; + case 4: + optype = "memory scrubbing error"; + break; + default: + optype = "reserved"; + break; + } + } + if (adxl_component_count) { + snprintf(skx_msg, MSG_SIZE, "%s%s err_code:0x%04x:0x%04x %s", + overflow ? " OVERFLOW" : "", + (uncorrected_error && recoverable) ? " recoverable" : "", + mscod, errcode, adxl_msg); + } else { + snprintf(skx_msg, MSG_SIZE, + "%s%s err_code:0x%04x:0x%04x socket:%d imc:%d rank:%d bg:%d ba:%d row:0x%x col:0x%x", + overflow ? " OVERFLOW" : "", + (uncorrected_error && recoverable) ? " recoverable" : "", + mscod, errcode, + res->socket, res->imc, res->rank, + res->bank_group, res->bank_address, res->row, res->column); + } + + edac_dbg(0, "%s\n", skx_msg); + + /* Call the helper to output message */ + edac_mc_handle_error(tp_event, mci, core_err_cnt, + m->addr >> PAGE_SHIFT, m->addr & ~PAGE_MASK, 0, + res->channel, res->dimm, -1, + optype, skx_msg); +} + +static int skx_mce_check_error(struct notifier_block *nb, unsigned long val, + void *data) +{ + struct mce *mce = (struct mce *)data; + struct decoded_addr res; + struct mem_ctl_info *mci; + char *type; -int skx_mce_check_error(struct notifier_block *nb, unsigned long val, - void *data); + if (edac_get_report_status() == EDAC_REPORTING_DISABLED) + return NOTIFY_DONE; -void skx_remove(void); + /* ignore unless this is memory related with an address */ + if ((mce->status & 0xefff) >> 7 != 1 || !(mce->status & MCI_STATUS_ADDRV)) + return NOTIFY_DONE; + + memset(&res, 0, sizeof(res)); + res.addr = mce->addr; + + if (adxl_component_count) { + if (!skx_adxl_decode(&res)) + return NOTIFY_DONE; + + mci = get_mci(res.socket, res.imc); + } else { + if (!skx_decode || !skx_decode(&res)) + return NOTIFY_DONE; + + mci = res.dev->imc[res.imc].mci; + } + + if (!mci) + return NOTIFY_DONE; + + if (mce->mcgstatus & MCG_STATUS_MCIP) + type = "Exception"; + else + type = "Event"; + + skx_mc_printk(mci, KERN_DEBUG, "HANDLING MCE MEMORY ERROR\n"); + + skx_mc_printk(mci, KERN_DEBUG, "CPU %d: Machine Check %s: 0x%llx " + "Bank %d: 0x%llx\n", mce->extcpu, type, + mce->mcgstatus, mce->bank, mce->status); + skx_mc_printk(mci, KERN_DEBUG, "TSC 0x%llx ", mce->tsc); + skx_mc_printk(mci, KERN_DEBUG, "ADDR 0x%llx ", mce->addr); + skx_mc_printk(mci, KERN_DEBUG, "MISC 0x%llx ", mce->misc); + + skx_mc_printk(mci, KERN_DEBUG, "PROCESSOR %u:0x%x TIME %llu SOCKET " + "%u APIC 0x%x\n", mce->cpuvendor, mce->cpuid, + mce->time, mce->socketid, mce->apicid); + + skx_mce_output_error(mci, mce, &res); + + return NOTIFY_DONE; +} + +static void skx_remove(void) +{ + int i, j; + struct skx_dev *d, *tmp; + + edac_dbg(0, "\n"); + + list_for_each_entry_safe(d, tmp, &dev_edac_list, list) { + list_del(&d->list); + for (i = 0; i < NUM_IMC; i++) { + if (d->imc[i].mci) + skx_unregister_mci(&d->imc[i]); + + if (d->imc[i].mdev) + pci_dev_put(d->imc[i].mdev); + + if (d->imc[i].mbase) + iounmap(d->imc[i].mbase); + + for (j = 0; j < NUM_CHANNELS; j++) { + if (d->imc[i].chan[j].cdev) + pci_dev_put(d->imc[i].chan[j].cdev); + } + } + if (d->util_all) + pci_dev_put(d->util_all); + if (d->sad_all) + pci_dev_put(d->sad_all); + if (d->uracu) + pci_dev_put(d->uracu); + + kfree(d); + } +} #ifdef CONFIG_EDAC_DEBUG -void setup_skx_debug(const char *dirname); -void teardown_skx_debug(void); +/* + * Debug feature. + * Exercise the address decode logic by writing an address to + * /sys/kernel/debug/edac/dirname/addr. + */ +static struct dentry *skx_test; + +static int debugfs_u64_set(void *data, u64 val) +{ + struct mce m; + + pr_warn_once("Fake error to 0x%llx injected via debugfs\n", val); + + memset(&m, 0, sizeof(m)); + /* ADDRV + MemRd + Unknown channel */ + m.status = MCI_STATUS_ADDRV + 0x90; + /* One corrected error */ + m.status |= BIT_ULL(MCI_STATUS_CEC_SHIFT); + m.addr = val; + skx_mce_check_error(NULL, 0, &m); + + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(fops_u64_wo, NULL, debugfs_u64_set, "%llu\n"); + +static void setup_skx_debug(const char *dirname) +{ + skx_test = edac_debugfs_create_dir(dirname); + if (!skx_test) + return; + + if (!edac_debugfs_create_file("addr", 0200, skx_test, + NULL, &fops_u64_wo)) { + debugfs_remove(skx_test); + skx_test = NULL; + } +} + +static void teardown_skx_debug(void) +{ + debugfs_remove_recursive(skx_test); +} #else static inline void setup_skx_debug(const char *dirname) {} static inline void teardown_skx_debug(void) {} #endif /*CONFIG_EDAC_DEBUG*/ - #endif /* _SKX_COMM_EDAC_H */ diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_edac.c similarity index 98% rename from drivers/edac/skx_base.c rename to drivers/edac/skx_edac.c index adae4c848ca1..2769d9d340c4 100644 --- a/drivers/edac/skx_base.c +++ b/drivers/edac/skx_edac.c @@ -11,6 +11,11 @@ #include <asm/mce.h> #include "edac_module.h" + +struct decoded_addr; +typedef bool (*skx_decode_f)(struct decoded_addr *res); +static skx_decode_f skx_decode; + #include "skx_common.h" #define EDAC_MOD_STR "skx_edac" @@ -529,7 +534,7 @@ static bool skx_mad_decode(struct decoded_addr *r) return true; } -static bool skx_decode(struct decoded_addr *res) +static bool skx_imc_decode(struct decoded_addr *res) { return skx_sad_decode(res) && skx_tad_decode(res) && skx_rir_decode(res) && skx_mad_decode(res); @@ -611,7 +616,7 @@ static int __init skx_init(void) } } - skx_set_decode(skx_decode); + skx_decode = skx_imc_decode; if (nvdimm_count && skx_adxl_get() == -ENODEV) skx_printk(KERN_NOTICE, "Only decoding DDR4 address!\n"); -- 2.19.1
On Thu, Mar 14, 2019 at 02:59:52PM -0700, Luck, Tony wrote: > I made a patch based on option #3. Rough steps were: > > $ cat skx_common.c >> skx_common.h That doesn't look real clean to me. So we have fsl_ddr_edac.c which gets linked in in two drivers and I think you could librarize that skx_common.c the same way and have the function exports in the wrapper drivers skx_edac and i10nm_edac calling those "library" functions in skx_common.c. IMNSVHO. Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Mar 15, 2019 at 10:43:42AM +0100, Borislav Petkov wrote: > On Thu, Mar 14, 2019 at 02:59:52PM -0700, Luck, Tony wrote: > > I made a patch based on option #3. Rough steps were: > > > > $ cat skx_common.c >> skx_common.h > > That doesn't look real clean to me. So we have fsl_ddr_edac.c which > gets linked in in two drivers and I think you could librarize that > skx_common.c the same way and have the function exports in the wrapper > drivers skx_edac and i10nm_edac calling those "library" functions in > skx_common.c. IMNSVHO. fsl_ddr_edac.c looks to be doing exactly what we are doing with skx_common.c. They just get away with it for now because they don't have a reference to THIS_MODULE since they don't set up anything in sysfs. If this is your goal, then Qiuxu's patch that moves the problem piece out of skx_common.c does what you are asking for. -Tony
On Fri, Mar 15, 2019 at 08:57:18AM -0700, Luck, Tony wrote: > fsl_ddr_edac.c looks to be doing exactly what we are doing with > skx_common.c. They just get away with it for now because they don't > have a reference to THIS_MODULE since they don't set up anything > in sysfs. > > If this is your goal, then Qiuxu's patch that moves the problem > piece out of skx_common.c does what you are asking for. My goal is to not make it too complex and this way burden future maintainability, without introducing the craziest randconfig build fixes. I think the shared code should not have any reference to modules because it is supposed to be a library. Can you move the THIS_MODULE out of the common file and into the actual module? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Mar 15, 2019 at 06:37:20PM +0100, Borislav Petkov wrote: > I think the shared code should not have any reference to modules because > it is supposed to be a library. Can you move the THIS_MODULE out of the > common file and into the actual module? Yes - Qiuxu did that already ... patch reposted below. -Tony From f9c656177e07fe5e978b09e5795a2a84a27bd54c Mon Sep 17 00:00:00 2001 From: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Date: Wed, 6 Mar 2019 09:59:14 +0800 Subject: [PATCH] EDAC, {skx|i10nm}_edac: Fix randconfig build error Kbuild failed on the kernel configurations below: CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=m CONFIG_EDAC_I10NM=y or CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=y CONFIG_EDAC_I10NM=m Failed log: ... CC [M] drivers/edac/skx_common.o ... .../skx_common.o:.../skx_common.c:672: undefined reference to `__this_module' That is because if one of the two drivers {skx|i10nm}_edac is built-in and the other one is built as a module, the shared file skx_common.c is always built to an object in module style by kbuild. Therefore, when linking for vmlinux, the '__this_module' symbol isn't defined. Fix it by moving the DEFINE_SIMPLE_ATTRIBUTE() from skx_common.c to skx_base.c and i10nm_base.c, where the '__this_module' is always defined whatever it's built-in or built as a module. Test the patch with following configurations: CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=[y|n] +------------------------------------+ | skx_edac | i10nm_edac | Build | |------------|--------------|--------| | m | m | ok | |------------|--------------|--------| | m | y | ok | |------------|--------------|--------| | y | m | ok | |------------|--------------|--------| | y | y | ok | +------------------------------------+ Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> --- drivers/edac/i10nm_base.c | 4 +++- drivers/edac/skx_base.c | 4 +++- drivers/edac/skx_common.c | 7 +++---- drivers/edac/skx_common.h | 7 +++++-- 4 files changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c index c334fb7c63df..57ae2c6d5958 100644 --- a/drivers/edac/i10nm_base.c +++ b/drivers/edac/i10nm_base.c @@ -181,6 +181,8 @@ static struct notifier_block i10nm_mce_dec = { .priority = MCE_PRIO_EDAC, }; +DEFINE_SIMPLE_ATTRIBUTE(fops_u64_wo, NULL, debugfs_u64_set, "%llu\n"); + static int __init i10nm_init(void) { u8 mc = 0, src_id = 0, node_id = 0; @@ -249,7 +251,7 @@ static int __init i10nm_init(void) opstate_init(); mce_register_decode_chain(&i10nm_mce_dec); - setup_skx_debug("i10nm_test"); + setup_skx_debug("i10nm_test", &fops_u64_wo); i10nm_printk(KERN_INFO, "%s\n", I10NM_REVISION); diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c index adae4c848ca1..1748f627ca6c 100644 --- a/drivers/edac/skx_base.c +++ b/drivers/edac/skx_base.c @@ -540,6 +540,8 @@ static struct notifier_block skx_mce_dec = { .priority = MCE_PRIO_EDAC, }; +DEFINE_SIMPLE_ATTRIBUTE(fops_u64_wo, NULL, debugfs_u64_set, "%llu\n"); + /* * skx_init: * make sure we are running on the correct cpu model @@ -619,7 +621,7 @@ static int __init skx_init(void) /* Ensure that the OPSTATE is set correctly for POLL or NMI */ opstate_init(); - setup_skx_debug("skx_test"); + setup_skx_debug("skx_test", &fops_u64_wo); mce_register_decode_chain(&skx_mce_dec); diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c index 0e96e7b5b0a7..f75af7ff5515 100644 --- a/drivers/edac/skx_common.c +++ b/drivers/edac/skx_common.c @@ -653,7 +653,7 @@ void skx_remove(void) */ static struct dentry *skx_test; -static int debugfs_u64_set(void *data, u64 val) +int debugfs_u64_set(void *data, u64 val) { struct mce m; @@ -669,16 +669,15 @@ static int debugfs_u64_set(void *data, u64 val) return 0; } -DEFINE_SIMPLE_ATTRIBUTE(fops_u64_wo, NULL, debugfs_u64_set, "%llu\n"); -void setup_skx_debug(const char *dirname) +void setup_skx_debug(const char *dirname, const struct file_operations *fops) { skx_test = edac_debugfs_create_dir(dirname); if (!skx_test) return; if (!edac_debugfs_create_file("addr", 0200, skx_test, - NULL, &fops_u64_wo)) { + NULL, fops)) { debugfs_remove(skx_test); skx_test = NULL; } diff --git a/drivers/edac/skx_common.h b/drivers/edac/skx_common.h index d25374e34d4f..637867e0952c 100644 --- a/drivers/edac/skx_common.h +++ b/drivers/edac/skx_common.h @@ -142,10 +142,13 @@ int skx_mce_check_error(struct notifier_block *nb, unsigned long val, void skx_remove(void); #ifdef CONFIG_EDAC_DEBUG -void setup_skx_debug(const char *dirname); +int debugfs_u64_set(void *data, u64 val); +void setup_skx_debug(const char *dirname, const struct file_operations *fops); void teardown_skx_debug(void); #else -static inline void setup_skx_debug(const char *dirname) {} +static inline int debugfs_u64_set(void *data, u64 val) { return -ENOENT; } +static inline void setup_skx_debug(const char *dirname, + const struct file_operations *fops) {} static inline void teardown_skx_debug(void) {} #endif /*CONFIG_EDAC_DEBUG*/ -- 2.19.1
On Fri, Mar 15, 2019 at 10:49:56AM -0700, Luck, Tony wrote:
> Yes - Qiuxu did that already ... patch reposted below.
... to which Arnd said that it were fragile because it might break if it
includes a THIS_MODULE reference. So I'd say we won't do that then. And
keep it strictly a library.
Right?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Mar 15, 2019 at 07:02:06PM +0100, Borislav Petkov wrote: > On Fri, Mar 15, 2019 at 10:49:56AM -0700, Luck, Tony wrote: > > Yes - Qiuxu did that already ... patch reposted below. > > ... to which Arnd said that it were fragile because it might break if it > includes a THIS_MODULE reference. So I'd say we won't do that then. And > keep it strictly a library. > > Right? What is your definition of "library"? fsl_ddr_edac.c seems to have the same potential for breakage if someone makes a change to that, then it will hit the same problem. -Tony
On Fri, Mar 15, 2019 at 7:11 PM Luck, Tony <tony.luck@intel.com> wrote: > > On Fri, Mar 15, 2019 at 07:02:06PM +0100, Borislav Petkov wrote: > > On Fri, Mar 15, 2019 at 10:49:56AM -0700, Luck, Tony wrote: > > > Yes - Qiuxu did that already ... patch reposted below. > > > > ... to which Arnd said that it were fragile because it might break if it > > includes a THIS_MODULE reference. So I'd say we won't do that then. And > > keep it strictly a library. > > > > Right? > > What is your definition of "library"? > > fsl_ddr_edac.c seems to have the same potential for breakage > if someone makes a change to that, then it will hit the same > problem. I think they are a bit safer because CONFIG_EDAC_LAYERSCAPE and CONFIG_EDAC_MPC85XX are mutually exclusive (one is only on powerpc, the other is only on ARM). It would break though if one were to make them build with CONFIG_COMPILE_TEST, or if another driver gets added that for ARM. I just thought about a possible Kconfig solution some more, and had this idea: diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index 47eb4d13ed5f..70080926329f 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -235,6 +235,7 @@ config EDAC_SKX depends on ACPI_NFIT || !ACPI_NFIT # if ACPI_NFIT=m, EDAC_SKX can't be y select DMI select ACPI_ADXL + select EDAC_SKX_COMMON help Support for error detection and correction the Intel Skylake server Integrated Memory Controllers. If your @@ -247,12 +248,20 @@ config EDAC_I10NM depends on ACPI_NFIT || !ACPI_NFIT # if ACPI_NFIT=m, EDAC_I10NM can't be y select DMI select ACPI_ADXL + select EDAC_SKX_COMMON help Support for error detection and correction the Intel 10nm server Integrated Memory Controllers. If your system has non-volatile DIMMs you should also manually select CONFIG_ACPI_NFIT. +config EDAC_SKX_COMMON + tristate + help + This is an internal helper symbol to ensure that all variants + of the EDAC_SKX driver are either built-in or modular, as mixing + the two causes link time problems. + config EDAC_PND2 tristate "Intel Pondicherry2" depends on PCI && X86_64 && X86_MCE_INTEL diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index 89ad4a84a0f6..01134051f5bf 100644 --- a/drivers/edac/Makefile +++ b/drivers/edac/Makefile @@ -58,10 +58,14 @@ layerscape_edac_mod-y := fsl_ddr_edac.o layerscape_edac.o obj-$(CONFIG_EDAC_LAYERSCAPE) += layerscape_edac_mod.o skx_edac-y := skx_common.o skx_base.o -obj-$(CONFIG_EDAC_SKX) += skx_edac.o +ifdef CONFIG_EDAC_SKX +obj-$(CONFIG_EDAC_SKX_COMMON) += skx_edac.o +endif i10nm_edac-y := skx_common.o i10nm_base.o +ifdef CONFIG_EDAC_I10NM obj-$(CONFIG_EDAC_SKX_COMMON) += i10nm_edac.o +endif obj-$(CONFIG_EDAC_MV64X60) += mv64x60_edac.o obj-$(CONFIG_EDAC_CELL) += cell_edac.o Basically I cheat Kconfig, so if one driver is built-in and the other is a loadable module, we compile both as built-in. Arnd
> Basically I cheat Kconfig, so if one driver is built-in and > the other is a loadable module, we compile both as built-in. My mail client may have munged that path. Both "git am" and "patch -p1" barf when I try to apply it. Since it was small I tried to replicate manually, but I must have messed up something because Kconfig still seems content to let me have CONFIG_EDAC_SKX=y with CONFIG_EDAC_I10NM=m :-( -Tony
On Fri, Mar 15, 2019 at 10:28 PM Luck, Tony <tony.luck@intel.com> wrote: > > > Basically I cheat Kconfig, so if one driver is built-in and > > the other is a loadable module, we compile both as built-in. > > My mail client may have munged that path. Both "git am" and "patch -p1" > barf when I try to apply it. Since it was small I tried to replicate manually, > but I must have messed up something because Kconfig still seems content > to let me have CONFIG_EDAC_SKX=y with CONFIG_EDAC_I10NM=m > > :-( Sorry, this was my mistake, my email was garbled. The patch was correct though: the idea here is not to change the Kconfig symbols but to change the Makefile to do the right thing even when Kconfig is set wrong. Arnd
On Fri, Mar 22, 2019 at 03:00:25PM +0100, Arnd Bergmann wrote: > Sorry, this was my mistake, my email was garbled. The patch was > correct though: the idea here is not to change the Kconfig symbols > but to change the Makefile to do the right thing even when Kconfig > is set wrong. Well this does seem like a "clever" way out of the randconfig problem. New patch applies and when I set .config to have: CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=y CONFIG_EDAC_I10NM=m CONFIG_EDAC_SKX_COMMON=y I don't see any build errors. There are lots of "skx_" symbols in System.map But I'm not at all sure what happened to the I10NM driver. I don't see it mentioned in the output from "make". It didn't get built as a module (no ".ko" file for it). It doesn't seem to be built in (no ".o" in drivers/edac/built-in.a) So I added a #error line to the start of i10nm_edac.c and ran "make" again. Nothing. So, I don't think this is doing what you think it should do. Even if it did, it would seem very confusing to a user that asked for "one module, one built-in" in Kconfig, but found both built-in. Boris: I'm voting for Qiuxu's most recent solution (moving all the EDAC_DEBUG bits out of skx_common.c). -Tony
On Fri, Mar 22, 2019 at 6:55 PM Luck, Tony <tony.luck@intel.com> wrote: > > On Fri, Mar 22, 2019 at 03:00:25PM +0100, Arnd Bergmann wrote: > > Sorry, this was my mistake, my email was garbled. The patch was > > correct though: the idea here is not to change the Kconfig symbols > > but to change the Makefile to do the right thing even when Kconfig > > is set wrong. > > Well this does seem like a "clever" way out of the randconfig > problem. New patch applies and when I set .config to have: > > CONFIG_EDAC_DEBUG=y > CONFIG_EDAC_SKX=y > CONFIG_EDAC_I10NM=m > CONFIG_EDAC_SKX_COMMON=y > > I don't see any build errors. > > There are lots of "skx_" symbols in System.map > > But I'm not at all sure what happened to the I10NM driver. > > I don't see it mentioned in the output from "make". > > It didn't get built as a module (no ".ko" file for it). > > It doesn't seem to be built in (no ".o" in drivers/edac/built-in.a) > > So I added a #error line to the start of i10nm_edac.c and > ran "make" again. Nothing. Oops, I guess my testing method was also insufficient, I only checked that the bug was gone, not that it actually worked. > So, I don't think this is doing what you think it should > do. Even if it did, it would seem very confusing to a user > that asked for "one module, one built-in" in Kconfig, but > found both built-in. > > Boris: I'm voting for Qiuxu's most recent solution (moving > all the EDAC_DEBUG bits out of skx_common.c). Yes, that's probably best then. My patch was likely close to another correct solution, but I've screwed it up twice now ;-) Arnd
On Thu, Mar 21, 2019 at 03:13:39PM -0700, Luck, Tony wrote: > > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > > Kbuild failed on the kernel configurations below: ... I've massaged this into the below and running randconfigs now: --- From: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Date: Thu, 21 Mar 2019 15:13:39 -0700 Subject: [PATCH] EDAC, skx, i10nm: Make skx_common.c a pure library The following Kconfig constellations fail randconfig builds: CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=m CONFIG_EDAC_I10NM=y or CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=y CONFIG_EDAC_I10NM=m with: ... CC [M] drivers/edac/skx_common.o ... .../skx_common.o:.../skx_common.c:672: undefined reference to `__this_module' That is because if one of the two drivers - skx_edac or i10nm_edac - is built-in and the other one is a module, the shared file skx_common.c gets linked into a module object by kbuild. Therefore, when linking that same file into vmlinux, the '__this_module' symbol used in debugfs isn't defined, leading to the above error. Fix it by moving all debugfs code from skx_common.c to both skx_base.c and i10nm_base.c respectively. Thus, skx_common.c doesn't refer to '__this_module' symbol anymore. Clarify its purpose at the top of the file for future reference, while at it. [ bp: Make text more readable. ] Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190321221339.GA32323@agluck-desk --- drivers/edac/i10nm_base.c | 52 +++++++++++++++++++++++++++++++++-- drivers/edac/skx_base.c | 50 +++++++++++++++++++++++++++++++++- drivers/edac/skx_common.c | 57 +++++++-------------------------------- drivers/edac/skx_common.h | 8 ------ 4 files changed, 109 insertions(+), 58 deletions(-) diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c index c334fb7c63df..6f06aec4877c 100644 --- a/drivers/edac/i10nm_base.c +++ b/drivers/edac/i10nm_base.c @@ -181,6 +181,54 @@ static struct notifier_block i10nm_mce_dec = { .priority = MCE_PRIO_EDAC, }; +#ifdef CONFIG_EDAC_DEBUG +/* + * Debug feature. + * Exercise the address decode logic by writing an address to + * /sys/kernel/debug/edac/i10nm_test/addr. + */ +static struct dentry *i10nm_test; + +static int debugfs_u64_set(void *data, u64 val) +{ + struct mce m; + + pr_warn_once("Fake error to 0x%llx injected via debugfs\n", val); + + memset(&m, 0, sizeof(m)); + /* ADDRV + MemRd + Unknown channel */ + m.status = MCI_STATUS_ADDRV + 0x90; + /* One corrected error */ + m.status |= BIT_ULL(MCI_STATUS_CEC_SHIFT); + m.addr = val; + skx_mce_check_error(NULL, 0, &m); + + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(fops_u64_wo, NULL, debugfs_u64_set, "%llu\n"); + +static void setup_i10nm_debug(void) +{ + i10nm_test = edac_debugfs_create_dir("i10nm_test"); + if (!i10nm_test) + return; + + if (!edac_debugfs_create_file("addr", 0200, i10nm_test, + NULL, &fops_u64_wo)) { + debugfs_remove(i10nm_test); + i10nm_test = NULL; + } +} + +static void teardown_i10nm_debug(void) +{ + debugfs_remove_recursive(i10nm_test); +} +#else +static inline void setup_i10nm_debug(void) {} +static inline void teardown_i10nm_debug(void) {} +#endif /*CONFIG_EDAC_DEBUG*/ + static int __init i10nm_init(void) { u8 mc = 0, src_id = 0, node_id = 0; @@ -249,7 +297,7 @@ static int __init i10nm_init(void) opstate_init(); mce_register_decode_chain(&i10nm_mce_dec); - setup_skx_debug("i10nm_test"); + setup_i10nm_debug(); i10nm_printk(KERN_INFO, "%s\n", I10NM_REVISION); @@ -262,7 +310,7 @@ static int __init i10nm_init(void) static void __exit i10nm_exit(void) { edac_dbg(2, "\n"); - teardown_skx_debug(); + teardown_i10nm_debug(); mce_unregister_decode_chain(&i10nm_mce_dec); skx_adxl_put(); skx_remove(); diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c index adae4c848ca1..a5c8fa3a249a 100644 --- a/drivers/edac/skx_base.c +++ b/drivers/edac/skx_base.c @@ -540,6 +540,54 @@ static struct notifier_block skx_mce_dec = { .priority = MCE_PRIO_EDAC, }; +#ifdef CONFIG_EDAC_DEBUG +/* + * Debug feature. + * Exercise the address decode logic by writing an address to + * /sys/kernel/debug/edac/skx_test/addr. + */ +static struct dentry *skx_test; + +static int debugfs_u64_set(void *data, u64 val) +{ + struct mce m; + + pr_warn_once("Fake error to 0x%llx injected via debugfs\n", val); + + memset(&m, 0, sizeof(m)); + /* ADDRV + MemRd + Unknown channel */ + m.status = MCI_STATUS_ADDRV + 0x90; + /* One corrected error */ + m.status |= BIT_ULL(MCI_STATUS_CEC_SHIFT); + m.addr = val; + skx_mce_check_error(NULL, 0, &m); + + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(fops_u64_wo, NULL, debugfs_u64_set, "%llu\n"); + +static void setup_skx_debug(void) +{ + skx_test = edac_debugfs_create_dir("skx_test"); + if (!skx_test) + return; + + if (!edac_debugfs_create_file("addr", 0200, skx_test, + NULL, &fops_u64_wo)) { + debugfs_remove(skx_test); + skx_test = NULL; + } +} + +static void teardown_skx_debug(void) +{ + debugfs_remove_recursive(skx_test); +} +#else +static inline void setup_skx_debug(void) {} +static inline void teardown_skx_debug(void) {} +#endif /*CONFIG_EDAC_DEBUG*/ + /* * skx_init: * make sure we are running on the correct cpu model @@ -619,7 +667,7 @@ static int __init skx_init(void) /* Ensure that the OPSTATE is set correctly for POLL or NMI */ opstate_init(); - setup_skx_debug("skx_test"); + setup_skx_debug(); mce_register_decode_chain(&skx_mce_dec); diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c index 0e96e7b5b0a7..b0dddcfa9baa 100644 --- a/drivers/edac/skx_common.c +++ b/drivers/edac/skx_common.c @@ -1,7 +1,15 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Common codes for both the skx_edac driver and Intel 10nm server EDAC driver. - * Originally split out from the skx_edac driver. + * + * Shared code by both skx_edac and i10nm_edac. Originally split out + * from the skx_edac driver. + * + * This file is linked into both skx_edac and i10nm_edac drivers. In + * order to avoid link errors, this file must be like a pure library + * without including symbols and defines which would otherwise conflict, + * when linked once into a module and into a built-in object, at the + * same time. For example, __this_module symbol references when that + * file is being linked into a built-in object. * * Copyright (c) 2018, Intel Corporation. */ @@ -644,48 +652,3 @@ void skx_remove(void) kfree(d); } } - -#ifdef CONFIG_EDAC_DEBUG -/* - * Debug feature. - * Exercise the address decode logic by writing an address to - * /sys/kernel/debug/edac/dirname/addr. - */ -static struct dentry *skx_test; - -static int debugfs_u64_set(void *data, u64 val) -{ - struct mce m; - - pr_warn_once("Fake error to 0x%llx injected via debugfs\n", val); - - memset(&m, 0, sizeof(m)); - /* ADDRV + MemRd + Unknown channel */ - m.status = MCI_STATUS_ADDRV + 0x90; - /* One corrected error */ - m.status |= BIT_ULL(MCI_STATUS_CEC_SHIFT); - m.addr = val; - skx_mce_check_error(NULL, 0, &m); - - return 0; -} -DEFINE_SIMPLE_ATTRIBUTE(fops_u64_wo, NULL, debugfs_u64_set, "%llu\n"); - -void setup_skx_debug(const char *dirname) -{ - skx_test = edac_debugfs_create_dir(dirname); - if (!skx_test) - return; - - if (!edac_debugfs_create_file("addr", 0200, skx_test, - NULL, &fops_u64_wo)) { - debugfs_remove(skx_test); - skx_test = NULL; - } -} - -void teardown_skx_debug(void) -{ - debugfs_remove_recursive(skx_test); -} -#endif /*CONFIG_EDAC_DEBUG*/ diff --git a/drivers/edac/skx_common.h b/drivers/edac/skx_common.h index d25374e34d4f..d18fa98669af 100644 --- a/drivers/edac/skx_common.h +++ b/drivers/edac/skx_common.h @@ -141,12 +141,4 @@ int skx_mce_check_error(struct notifier_block *nb, unsigned long val, void skx_remove(void); -#ifdef CONFIG_EDAC_DEBUG -void setup_skx_debug(const char *dirname); -void teardown_skx_debug(void); -#else -static inline void setup_skx_debug(const char *dirname) {} -static inline void teardown_skx_debug(void) {} -#endif /*CONFIG_EDAC_DEBUG*/ - #endif /* _SKX_COMM_EDAC_H */ -- 2.21.0 -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index 89ad4a84a0f6..ab01d15be46c 100644 --- a/drivers/edac/Makefile +++ b/drivers/edac/Makefile @@ -57,11 +57,11 @@ obj-$(CONFIG_EDAC_MPC85XX) += mpc85xx_edac_mod.o layerscape_edac_mod-y := fsl_ddr_edac.o layerscape_edac.o obj-$(CONFIG_EDAC_LAYERSCAPE) += layerscape_edac_mod.o -skx_edac-y := skx_common.o skx_base.o -obj-$(CONFIG_EDAC_SKX) += skx_edac.o +skx_edac-y := skx_base.o +obj-$(CONFIG_EDAC_SKX) += skx_common.o skx_edac.o -i10nm_edac-y := skx_common.o i10nm_base.o -obj-$(CONFIG_EDAC_I10NM) += i10nm_edac.o +i10nm_edac-y := i10nm_base.o +obj-$(CONFIG_EDAC_I10NM) += skx_common.o i10nm_edac.o obj-$(CONFIG_EDAC_MV64X60) += mv64x60_edac.o obj-$(CONFIG_EDAC_CELL) += cell_edac.o diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c index 0e96e7b5b0a7..bb183035df5c 100644 --- a/drivers/edac/skx_common.c +++ b/drivers/edac/skx_common.c @@ -83,12 +83,14 @@ int __init skx_adxl_get(void) return -ENODEV; } +EXPORT_SYMBOL_GPL(skx_adxl_get); void __exit skx_adxl_put(void) { kfree(adxl_values); kfree(adxl_msg); } +EXPORT_SYMBOL_GPL(skx_adxl_put); static bool skx_adxl_decode(struct decoded_addr *res) { @@ -127,6 +129,7 @@ void skx_set_decode(skx_decode_f decode) { skx_decode = decode; } +EXPORT_SYMBOL_GPL(skx_set_decode); int skx_get_src_id(struct skx_dev *d, u8 *id) { @@ -140,6 +143,7 @@ int skx_get_src_id(struct skx_dev *d, u8 *id) *id = GET_BITFIELD(reg, 12, 14); return 0; } +EXPORT_SYMBOL_GPL(skx_get_src_id); int skx_get_node_id(struct skx_dev *d, u8 *id) { @@ -153,6 +157,7 @@ int skx_get_node_id(struct skx_dev *d, u8 *id) *id = GET_BITFIELD(reg, 0, 2); return 0; } +EXPORT_SYMBOL_GPL(skx_get_node_id); static int get_width(u32 mtr) { @@ -219,6 +224,7 @@ int skx_get_all_bus_mappings(unsigned int did, int off, enum type type, *list = &dev_edac_list; return ndev; } +EXPORT_SYMBOL_GPL(skx_get_all_bus_mappings); int skx_get_hi_lo(unsigned int did, int off[], u64 *tolm, u64 *tohm) { @@ -258,6 +264,7 @@ int skx_get_hi_lo(unsigned int did, int off[], u64 *tolm, u64 *tohm) pci_dev_put(pdev); return -ENODEV; } +EXPORT_SYMBOL_GPL(skx_get_hi_lo); static int skx_get_dimm_attr(u32 reg, int lobit, int hibit, int add, int minval, int maxval, const char *name) @@ -311,6 +318,7 @@ int skx_get_dimm_info(u32 mtr, u32 amap, struct dimm_info *dimm, return 1; } +EXPORT_SYMBOL_GPL(skx_get_dimm_info); int skx_get_nvdimm_info(struct dimm_info *dimm, struct skx_imc *imc, int chan, int dimmno, const char *mod_str) @@ -359,6 +367,7 @@ int skx_get_nvdimm_info(struct dimm_info *dimm, struct skx_imc *imc, return (size == 0 || size == ~0ull) ? 0 : 1; } +EXPORT_SYMBOL_GPL(skx_get_nvdimm_info); int skx_register_mci(struct skx_imc *imc, struct pci_dev *pdev, const char *ctl_name, const char *mod_str, @@ -426,6 +435,7 @@ int skx_register_mci(struct skx_imc *imc, struct pci_dev *pdev, imc->mci = NULL; return rc; } +EXPORT_SYMBOL_GPL(skx_register_mci); static void skx_unregister_mci(struct skx_imc *imc) { @@ -609,6 +619,7 @@ int skx_mce_check_error(struct notifier_block *nb, unsigned long val, return NOTIFY_DONE; } +EXPORT_SYMBOL_GPL(skx_mce_check_error); void skx_remove(void) { @@ -644,6 +655,7 @@ void skx_remove(void) kfree(d); } } +EXPORT_SYMBOL_GPL(skx_remove); #ifdef CONFIG_EDAC_DEBUG /* @@ -683,9 +695,11 @@ void setup_skx_debug(const char *dirname) skx_test = NULL; } } +EXPORT_SYMBOL_GPL(setup_skx_debug); void teardown_skx_debug(void) { debugfs_remove_recursive(skx_test); } +EXPORT_SYMBOL_GPL(teardown_skx_debug); #endif /*CONFIG_EDAC_DEBUG*/
In a configuration where one of the two drivers is built-in and the other one is a module, kbuild tries to add the modular file into vmlinux, and fails with drivers/edac/skx_common.o:(.rodata+0xc0): undefined reference to `__this_module' We either have to make both drivers be configured the same way all the time, or make skx_common a separate module. This takes the second approach, which is more logical, but has the downside of requiring lots of new EXPORT_SYMBOL_GPL() statements. Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- drivers/edac/Makefile | 8 ++++---- drivers/edac/skx_common.c | 14 ++++++++++++++ 2 files changed, 18 insertions(+), 4 deletions(-) -- 2.20.0