Message ID | 20201208172959.1249-2-shiju.jose@huawei.com |
---|---|
State | New |
Headers | show |
Series | EDAC/ghes: Add EDAC device for recording the CPU error count | expand |
On Tue, Dec 08, 2020 at 05:29:58PM +0000, Shiju Jose wrote: > The corrected error count on the CPU caches required > reporting to the user-space for the predictive failure > analysis. For this purpose, add an EDAC device and device > blocks for the CPU caches found. > The cache's corrected error count would be stored in the > /sys/devices/system/edac/cpu/cpu*/cache*/ce_count. This still doesn't begin to explain why the kernel needs this. I had already asked whether errors in CPU caches are something which happen often enough so that software should count them but nothing came. So pls justify first why this wants to be added to the kernel. > diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig > index 7a47680d6f07..c73eeab27ac9 100644 > --- a/drivers/edac/Kconfig > +++ b/drivers/edac/Kconfig > @@ -74,6 +74,16 @@ config EDAC_GHES > > In doubt, say 'Y'. > > +config EDAC_GHES_CPU_ERROR > + bool "EDAC device for reporting firmware-first BIOS detected CPU error count" Why a separate Kconfig item? > + depends on EDAC_GHES > + help > + EDAC device for the firmware-first BIOS detected CPU error count reported Well this is not what it is doing - you're talking about cache errors. "CPU errors" can be a lot more than just cache errors. > +static void ghes_edac_create_cpu_device(struct device *dev) > +{ > + int cpu; > + struct cpu_cacheinfo *this_cpu_ci; > + > + /* > + * Find the maximum number of caches present in the CPU heirarchy > + * among the online CPUs. > + */ > + for_each_online_cpu(cpu) { > + this_cpu_ci = get_cpu_cacheinfo(cpu); > + if (!this_cpu_ci) > + continue; > + if (max_number_of_caches < this_cpu_ci->num_leaves) > + max_number_of_caches = this_cpu_ci->num_leaves; So this is counting the number of cache levels on the system? So you want to count the errors per cache levels? > + } > + if (!max_number_of_caches) > + return; > + > + /* > + * EDAC device interface only supports creating the CPU cache hierarchy for alls > + * the CPUs together. Thus need to allocate cpu_edac_block_list for the > + * max_number_of_caches among all the CPUs irrespective of the number of caches > + * per CPU might vary. > + */ So this is lumping all the caches together into a single list? What for? To untangle to the proper ones when the error gets reported? Have you heard of percpu variables? > @@ -624,6 +787,10 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) > ghes_pvt = pvt; > spin_unlock_irqrestore(&ghes_lock, flags); > > +#if defined(CONFIG_EDAC_GHES_CPU_ERROR) > + ghes_edac_create_cpu_device(dev); > +#endif > + Init stuff belongs into ghes_scan_system(). ... Ok, I've seen enough. "required reporting to the user-space for the predictive failure analysis." is not even trying to explain *why* you're doing this, what *actual* problem it is addressing and why should the kernel get it. And without a proper problem definition of what you're trying to solve, this is not going anywhere. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Hi Boris, Thanks for the feedback. Apologies for the delay. >-----Original Message----- >From: Borislav Petkov [mailto:bp@alien8.de] >Sent: 31 December 2020 16:44 >To: Shiju Jose <shiju.jose@huawei.com> >Cc: linux-edac@vger.kernel.org; linux-acpi@vger.kernel.org; linux- >kernel@vger.kernel.org; james.morse@arm.com; >mchehab+huawei@kernel.org; tony.luck@intel.com; rjw@rjwysocki.net; >lenb@kernel.org; rrichter@marvell.com; Linuxarm <linuxarm@huawei.com>; >xuwei (O) <xuwei5@huawei.com>; Jonathan Cameron ><jonathan.cameron@huawei.com>; John Garry <john.garry@huawei.com>; >tanxiaofei <tanxiaofei@huawei.com>; Shameerali Kolothum Thodi ><shameerali.kolothum.thodi@huawei.com>; Salil Mehta ><salil.mehta@huawei.com> >Subject: Re: [RFC PATCH 1/2] EDAC/ghes: Add EDAC device for the CPU >caches > >On Tue, Dec 08, 2020 at 05:29:58PM +0000, Shiju Jose wrote: >> The corrected error count on the CPU caches required reporting to the >> user-space for the predictive failure analysis. For this purpose, add >> an EDAC device and device blocks for the CPU caches found. >> The cache's corrected error count would be stored in the >> /sys/devices/system/edac/cpu/cpu*/cache*/ce_count. > >This still doesn't begin to explain why the kernel needs this. I had already >asked whether errors in CPU caches are something which happen often >enough so that software should count them but nothing came. So pls justify >first why this wants to be added to the kernel. L2 cache corrected errors are detected occasionally on few of our ARM64 hardware boards. Though it is rare, the probability of the CPU cache errors frequently occurring can't be avoided. The earlier failure detection by monitoring the cache corrected errors for the frequent occurrences and taking preventive action could prevent more serious hardware faults. On Intel architectures, cache corrected errors are reported and the affected cores are offline in the architecture specific method. http://www.mcelog.org/cache.html However for the firmware-first error reporting, specifically on ARM64 architectures, there is no provision present for reporting the cache corrected error count to the user-space and taking preventive action such as offline the affected cores. > >> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index >> 7a47680d6f07..c73eeab27ac9 100644 >> --- a/drivers/edac/Kconfig >> +++ b/drivers/edac/Kconfig >> @@ -74,6 +74,16 @@ config EDAC_GHES >> >> In doubt, say 'Y'. >> >> +config EDAC_GHES_CPU_ERROR >> + bool "EDAC device for reporting firmware-first BIOS detected CPU >error count" > >Why a separate Kconfig item? CONFIG_EDAC_GHES_CPU_CACHE_ERROR is added to make this feature optional only for the platforms which need this and supported. > >> + depends on EDAC_GHES >> + help >> + EDAC device for the firmware-first BIOS detected CPU error count >> +reported > >Well this is not what it is doing - you're talking about cache errors. >"CPU errors" can be a lot more than just cache errors. Sure. I will change. > >> +static void ghes_edac_create_cpu_device(struct device *dev) { >> + int cpu; >> + struct cpu_cacheinfo *this_cpu_ci; >> + >> + /* >> + * Find the maximum number of caches present in the CPU heirarchy >> + * among the online CPUs. >> + */ >> + for_each_online_cpu(cpu) { >> + this_cpu_ci = get_cpu_cacheinfo(cpu); >> + if (!this_cpu_ci) >> + continue; >> + if (max_number_of_caches < this_cpu_ci->num_leaves) >> + max_number_of_caches = this_cpu_ci->num_leaves; > >So this is counting the number of cache levels on the system? So you want to >count the errors per cache levels? Yes. This was the suggestion from James and to offline the affected cores for the shared cache. > >> + } >> + if (!max_number_of_caches) >> + return; >> + >> + /* >> + * EDAC device interface only supports creating the CPU cache >hierarchy for alls >> + * the CPUs together. Thus need to allocate cpu_edac_block_list for >the >> + * max_number_of_caches among all the CPUs irrespective of the >number of caches >> + * per CPU might vary. >> + */ > >So this is lumping all the caches together into a single list? What for? >To untangle to the proper ones when the error gets reported? > >Have you heard of percpu variables? Yes. Changed the list to the percpu variable. > >> @@ -624,6 +787,10 @@ int ghes_edac_register(struct ghes *ghes, struct >device *dev) >> ghes_pvt = pvt; >> spin_unlock_irqrestore(&ghes_lock, flags); >> >> +#if defined(CONFIG_EDAC_GHES_CPU_ERROR) >> + ghes_edac_create_cpu_device(dev); >> +#endif >> + > >Init stuff belongs into ghes_scan_system(). > Did you mean calling ghes_edac_create_cpu_device() in the ghes_scan_system()? >... > >Ok, I've seen enough. "required reporting to the user-space for the predictive >failure analysis." is not even trying to explain *why* you're doing this, what >*actual* problem it is addressing and why should the kernel get it. > >And without a proper problem definition of what you're trying to solve, this >is not going anywhere. > >-- >Regards/Gruss, > Boris. > Thanks, Shiju
On Fri, Jan 15, 2021 at 11:06:30AM +0000, Shiju Jose wrote: > L2 cache corrected errors are detected occasionally on few of > our ARM64 hardware boards. Though it is rare, the probability of > the CPU cache errors frequently occurring can't be avoided. > The earlier failure detection by monitoring the cache corrected > errors for the frequent occurrences and taking preventive > action could prevent more serious hardware faults. > > On Intel architectures, cache corrected errors are reported and > the affected cores are offline in the architecture specific method. > http://www.mcelog.org/cache.html > > However for the firmware-first error reporting, specifically on > ARM64 architectures, there is no provision present for reporting > the cache corrected error count to the user-space and taking > preventive action such as offline the affected cores. How hard was it to write that in your first submission? What do you think would be the best way to persuade a patch reviewer/maintainer to take a look at your submission? > >Why a separate Kconfig item? > CONFIG_EDAC_GHES_CPU_CACHE_ERROR is added to make this > feature optional only for the platforms which need this and supported. > > > > >> + depends on EDAC_GHES depends on EDAC_GHES hardly expresses which platforms need it/support it. If anything, depends on ARM64. > >Init stuff belongs into ghes_scan_system(). > > > Did you mean calling ghes_edac_create_cpu_device() in the ghes_scan_system()? I mean, all hardware discovery needs to happen in ghes_scan_system - you don't need to call those from outside the driver, in ghes_edac_register(). -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Hi Boris, >-----Original Message----- >From: Borislav Petkov [mailto:bp@alien8.de] >Sent: 18 January 2021 18:37 >To: Shiju Jose <shiju.jose@huawei.com> >Cc: linux-edac@vger.kernel.org; linux-acpi@vger.kernel.org; linux- >kernel@vger.kernel.org; james.morse@arm.com; >mchehab+huawei@kernel.org; tony.luck@intel.com; rjw@rjwysocki.net; >lenb@kernel.org; rrichter@marvell.com; Jonathan Cameron ><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>; >linuxarm@openeuler.org >Subject: Re: [RFC PATCH 1/2] EDAC/ghes: Add EDAC device for the CPU >caches > >On Fri, Jan 15, 2021 at 11:06:30AM +0000, Shiju Jose wrote: >> L2 cache corrected errors are detected occasionally on few of our >> ARM64 hardware boards. Though it is rare, the probability of the CPU >> cache errors frequently occurring can't be avoided. >> The earlier failure detection by monitoring the cache corrected errors >> for the frequent occurrences and taking preventive action could >> prevent more serious hardware faults. >> >> On Intel architectures, cache corrected errors are reported and the >> affected cores are offline in the architecture specific method. >> http://www.mcelog.org/cache.html >> >> However for the firmware-first error reporting, specifically on >> ARM64 architectures, there is no provision present for reporting the >> cache corrected error count to the user-space and taking preventive >> action such as offline the affected cores. > >How hard was it to write that in your first submission? What do you think >would be the best way to persuade a patch reviewer/maintainer to take a >look at your submission? > >> >Why a separate Kconfig item? >> CONFIG_EDAC_GHES_CPU_CACHE_ERROR is added to make this feature >> optional only for the platforms which need this and supported. >> >> > >> >> + depends on EDAC_GHES > >depends on EDAC_GHES hardly expresses which platforms need it/support it. > >If anything, depends on ARM64. Sure. I will add dependency on ARM64. This EDAC code for the cache errors is architecture independent for the firmware-first error reporting and could be used for other architectures, though now we need it for the ARM64. > >> >Init stuff belongs into ghes_scan_system(). >> > >> Did you mean calling ghes_edac_create_cpu_device() in the >ghes_scan_system()? > >I mean, all hardware discovery needs to happen in ghes_scan_system >- you don't need to call those from outside the driver, in >ghes_edac_register(). sure. Will modify. > >-- >Regards/Gruss, > Boris. Thanks, Shiju
On Tue, Jan 19, 2021 at 10:04:23AM +0000, Shiju Jose wrote: > This EDAC code for the cache errors is architecture independent for the > firmware-first error reporting and could be used for other architectures, I'm not so sure about that because you're lumping all the cache hierarchies together and there might be L3 slices on some x86 incarnations, for example, which do not belong to the core you're reporting the error for... It would need to be tested though. Also, if this is a firmware-first mode, then I would expect the firmware to also report which cache triggered the error and thus not need any OS glue at all. Therefore ARM only and I'd need an ACK from ARM folks whether they want it this way. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
diff --git a/Documentation/ABI/testing/sysfs-devices-edac b/Documentation/ABI/testing/sysfs-devices-edac index 256a9e990c0b..56a18b0af419 100644 --- a/Documentation/ABI/testing/sysfs-devices-edac +++ b/Documentation/ABI/testing/sysfs-devices-edac @@ -155,3 +155,18 @@ Description: This attribute file displays the total count of uncorrectable errors that have occurred on this DIMM. If panic_on_ue is set, this counter will not have a chance to increment, since EDAC will panic the system + +What: /sys/devices/system/edac/cpu/cpu*/cache*/ce_count +Date: December 2020 +Contact: linux-edac@vger.kernel.org +Description: This attribute file displays the total count of correctable + errors that have occurred on this CPU cache. This count is very important + to examine. CEs provide early indications that a cache is beginning + to fail. This count field should be monitored for non-zero values + and report such information to the system administrator. + +What: /sys/devices/system/edac/cpu/cpu*/cache*/ue_count +Date: December 2020 +Contact: linux-edac@vger.kernel.org +Description: This attribute file displays the total count of uncorrectable + errors that have occurred on this CPU cache. diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index fce7ade2aba9..e7b0edbda0f8 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -1452,4 +1452,10 @@ static int __init ghes_init(void) err: return rc; } -device_initcall(ghes_init); + +/* + * device_initcall_sync() is added instead of the device_initcall() + * because the CPU cacheinfo should be populated and needed for + * adding the CPU cache edac device blocks in the ghes_edac_register(). + */ +device_initcall_sync(ghes_init); diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index 7a47680d6f07..c73eeab27ac9 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -74,6 +74,16 @@ config EDAC_GHES In doubt, say 'Y'. +config EDAC_GHES_CPU_ERROR + bool "EDAC device for reporting firmware-first BIOS detected CPU error count" + depends on EDAC_GHES + help + EDAC device for the firmware-first BIOS detected CPU error count reported + via ACPI APEI/GHES. By enabling this option, EDAC device for the CPU + hierarchy and edac device blocks for caches would be created. + The cpu error count is shared with the userspace via the CPU EDAC + device's sysfs interface. + config EDAC_AMD64 tristate "AMD64 (Opteron, Athlon64)" depends on AMD_NB && EDAC_DECODE_MCE diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index a918ca93e4f7..e6a73a413b33 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -12,6 +12,9 @@ #include <acpi/ghes.h> #include <linux/edac.h> #include <linux/dmi.h> +#if defined(CONFIG_EDAC_GHES_CPU_ERROR) +#include <linux/cacheinfo.h> +#endif #include "edac_module.h" #include <ras/ras_event.h> @@ -497,6 +500,166 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) spin_unlock_irqrestore(&ghes_lock, flags); } +#if defined(CONFIG_EDAC_GHES_CPU_ERROR) +static struct ghes_edac_cpu_block { + int cpu; + u8 level; + u8 type; + int block_nr; + bool info_populated; +} *cpu_edac_block_list; + +static struct edac_device_ctl_info *cpu_edac_dev; +static unsigned int max_number_of_caches; + +static int ghes_edac_add_cpu_device(struct device *dev) +{ + int rc; + + cpu_edac_dev = edac_device_alloc_ctl_info(0, "cpu", num_possible_cpus(), + "cache", max_number_of_caches, 0, NULL, + 0, edac_device_alloc_index()); + if (!cpu_edac_dev) { + pr_warn("ghes-edac cpu edac device registration failed\n"); + return -ENOMEM; + } + + cpu_edac_dev->dev = dev; + cpu_edac_dev->ctl_name = "cpu_edac_dev"; + cpu_edac_dev->dev_name = "ghes"; + cpu_edac_dev->mod_name = "ghes_edac.c"; + rc = edac_device_add_device(cpu_edac_dev); + if (rc > 0) { + pr_warn("edac_device_add_device failed\n"); + edac_device_free_ctl_info(cpu_edac_dev); + return -ENOMEM; + } + + return 0; +} + +static void ghes_edac_delete_cpu_device(void) +{ + max_number_of_caches = 0; + if (cpu_edac_dev) { + edac_device_del_device(cpu_edac_dev->dev); + edac_device_free_ctl_info(cpu_edac_dev); + } + vfree(cpu_edac_block_list); +} + +static int ghes_edac_get_cache_info(int cpu) +{ + struct ghes_edac_cpu_block *block; + struct cpu_cacheinfo *this_cpu_ci; + struct cacheinfo *this_leaf; + int i, num_caches; + + this_cpu_ci = get_cpu_cacheinfo(cpu); + if (!this_cpu_ci || !this_cpu_ci->info_list || !this_cpu_ci->num_leaves) + return -EINVAL; + + this_leaf = this_cpu_ci->info_list; + /* + * Cache info would not be available for a CPU which is offline. However EDAC device + * creation requires the number of device blocks (for example max number of caches + * among CPUs). Thus cache info in the cpu_edac_block_list would be populated when + * the error is reported on that cpu. Thus we need to restrict the number of caches + * if the CPU's num_leaves exceed the max_number_of_caches. + */ + num_caches = min(max_number_of_caches, this_cpu_ci->num_leaves); + + /* + * The edac CPU cache device blocks entries in the sysfs should match with the + * CPU cache structure in the sysfs so that the affected cpus for a shared cache + * can be easily extracted in the userspace. + */ + for (i = 0; i < num_caches; i++) { + block = cpu_edac_block_list + (cpu * max_number_of_caches) + i; + block->cpu = cpu; + block->level = this_leaf->level; + block->type = this_leaf->type; + block->block_nr = i; + block->info_populated = true; + this_leaf++; + } + + return 0; +} + +static void ghes_edac_create_cpu_device(struct device *dev) +{ + int cpu; + struct cpu_cacheinfo *this_cpu_ci; + + /* + * Find the maximum number of caches present in the CPU heirarchy + * among the online CPUs. + */ + for_each_online_cpu(cpu) { + this_cpu_ci = get_cpu_cacheinfo(cpu); + if (!this_cpu_ci) + continue; + if (max_number_of_caches < this_cpu_ci->num_leaves) + max_number_of_caches = this_cpu_ci->num_leaves; + } + if (!max_number_of_caches) + return; + + /* + * EDAC device interface only supports creating the CPU cache hierarchy for alls + * the CPUs together. Thus need to allocate cpu_edac_block_list for the + * max_number_of_caches among all the CPUs irrespective of the number of caches + * per CPU might vary. + */ + cpu_edac_block_list = vzalloc(num_possible_cpus() * max_number_of_caches * + sizeof(*cpu_edac_block_list)); + if (!cpu_edac_block_list) + return; + + if (ghes_edac_add_cpu_device(dev)) + goto error; + + for_each_online_cpu(cpu) + ghes_edac_get_cache_info(cpu); + + return; + +error: + ghes_edac_delete_cpu_device(); +} + +void ghes_edac_report_cpu_error(struct ghes_einfo_cpu *einfo) +{ + struct ghes_edac_cpu_block *block; + int i; + + if (!einfo || !(einfo->ce_count) || !max_number_of_caches) + return; + + /* + * EDAC device require the number of device blocks (for example max number of caches + * among CPUs) during the creation. For the CPUs that were offline in the cpu edac + * init and become online later, the cache info would be populated when the error is + * reported on that cpu. + */ + block = cpu_edac_block_list + (einfo->cpu * max_number_of_caches); + if (!block->info_populated) { + if (ghes_edac_get_cache_info(einfo->cpu)) + return; + } + + for (i = 0; i < max_number_of_caches; i++) { + block = cpu_edac_block_list + (einfo->cpu * max_number_of_caches) + i; + if ((block->level == einfo->cache_level) && (block->type == einfo->cache_type)) { + edac_device_handle_ce_count(cpu_edac_dev, einfo->ce_count, + einfo->cpu, block->block_nr, ""); + break; + } + } +} +#endif + /* * Known systems that are safe to enable this module. */ @@ -624,6 +787,10 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) ghes_pvt = pvt; spin_unlock_irqrestore(&ghes_lock, flags); +#if defined(CONFIG_EDAC_GHES_CPU_ERROR) + ghes_edac_create_cpu_device(dev); +#endif + /* only set on success */ refcount_set(&ghes_refcount, 1); @@ -654,6 +821,10 @@ void ghes_edac_unregister(struct ghes *ghes) if (!refcount_dec_and_test(&ghes_refcount)) goto unlock; +#if defined(CONFIG_EDAC_GHES_CPU_ERROR) + ghes_edac_delete_cpu_device(); +#endif + /* * Wait for the irq handler being finished. */ diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index 34fb3431a8f3..8702eba26afc 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -73,6 +73,24 @@ void ghes_unregister_vendor_record_notifier(struct notifier_block *nb); int ghes_estatus_pool_init(int num_ghes); +/** + * struct ghes_einfo_cpu - structure to pass the CPU error info to the edac + * @cpu: CPU index. + * @error_type: error type, cache/TLB/bus/ etc. + * @cache_level: cache level. + * @cache_type: ACPI cache type. + * @ue_count: CPU uncorrectable error count. + * @ce_count: CPU correctable error count. + */ +struct ghes_einfo_cpu { + int cpu; + u8 error_type; + u8 cache_level; + u8 cache_type; + u16 ue_count; + u16 ce_count; +}; + /* From drivers/edac/ghes_edac.c */ #ifdef CONFIG_EDAC_GHES @@ -98,6 +116,15 @@ static inline void ghes_edac_unregister(struct ghes *ghes) } #endif +#ifdef CONFIG_EDAC_GHES_CPU_ERROR +void ghes_edac_report_cpu_error(struct ghes_einfo_cpu *einfo_cpu); + +#else +static inline void ghes_edac_report_cpu_error(struct ghes_einfo_cpu *einfo_cpu) +{ +} +#endif + static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata) { return gdata->revision >> 8;
The corrected error count on the CPU caches required reporting to the user-space for the predictive failure analysis. For this purpose, add an EDAC device and device blocks for the CPU caches found. The cache's corrected error count would be stored in the /sys/devices/system/edac/cpu/cpu*/cache*/ce_count. Issues and possible solutions, 1.Cache info is not available for the CPUs offline. EDAC device interface supports only creating EDAC device and device blocks for the all the CPU caches together. It requires the number of caches as device blocks for the creation. However, this info is not available for the CPUs which offline and may become online later. Tested Solution: Find the max number of caches among online CPUs, create the EDAC device for CPUs caches and get and populate the cache info for an offline CPU later, when the error is reported on that CPU for the first time. 2. Reporting error count for the Shared caches. There are few possible solutions, 2.1 Kernel would report a new error count for a shared cache per CPU through the EDAC device block for that CPU. Then user-space application sums the total error count for a shared cache from EDAC device blocks of all the CPUs in the shared CPU list of that shared cache. 2.2 Kernel would report a new error count for a shared cache through the EDAC device blocks for all the CPUs in the shared CPU list of that shared cache. Tested Solution: The current implementation used the solution 2.1 For the firmware-first error handling, add an interface in the ghes_edac for reporting the CPU corrected error count for a CPU core to the user-space through the CPU EDAC device. Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> --- Documentation/ABI/testing/sysfs-devices-edac | 15 ++ drivers/acpi/apei/ghes.c | 8 +- drivers/edac/Kconfig | 10 ++ drivers/edac/ghes_edac.c | 171 +++++++++++++++++++ include/acpi/ghes.h | 27 +++ 5 files changed, 230 insertions(+), 1 deletion(-) -- 2.17.1