mbox series

[0/4] MCE wrapper and support for new SMCA syndrome MSRs

Message ID 20240530211620.1829453-1-avadhut.naik@amd.com
Headers show
Series MCE wrapper and support for new SMCA syndrome MSRs | expand

Message

Avadhut Naik May 30, 2024, 9:16 p.m. UTC
This patchset adds a new wrapper for struct mce to prevent its bloating
and export vendor specific error information. Additionally, support is
also introduced for two new "syndrome" MSRs used in newer AMD Scalable
MCA (SMCA) systems. Also, a new "FRU Text in MCA" feature that uses these
new "syndrome" MSRs has been addded.

Patch 1 adds the new wrapper structure mce_hw_err for the struct mce
while also modifying the mce_record tracepoint to use the new wrapper.

Patch 2 adds support for the new "syndrome" registers. They are read/printed
wherever the existing MCA_SYND register is used.

Patch 3 updates the function that pulls MCA information from UEFI x86
Common Platform Error Records (CPERs) to handle systems that support the
new registers.

Patch 4 adds support to the AMD MCE decoder module to detect and use the
"FRU Text in MCA" feature which leverages the new registers.

NOTE:

This set was initially submitted as part of the larger MCA Updates set.

v1: https://lore.kernel.org/linux-edac/20231118193248.1296798-1-yazen.ghannam@amd.com/
v2: https://lore.kernel.org/linux-edac/20240404151359.47970-1-yazen.ghannam@amd.com/

However, since the MCA Updates set has been split up into smaller sets,
this set, going forward, will be submitted independently.

Having said that, this set set depends on and applies cleanly on top of
the below two sets.

1: https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/
2: https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/

Avadhut Naik (2):
  x86/mce: Add wrapper for struct mce to export vendor specific info
  x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

Yazen Ghannam (2):
  x86/mce/apei: Handle variable register array size
  EDAC/mce_amd: Add support for FRU Text in MCA

 arch/x86/include/asm/mce.h              |  20 ++-
 arch/x86/kernel/cpu/mce/apei.c          | 111 ++++++++++----
 arch/x86/kernel/cpu/mce/core.c          | 191 ++++++++++++++----------
 arch/x86/kernel/cpu/mce/dev-mcelog.c    |   2 +-
 arch/x86/kernel/cpu/mce/genpool.c       |  20 +--
 arch/x86/kernel/cpu/mce/inject.c        |   4 +-
 arch/x86/kernel/cpu/mce/internal.h      |   4 +-
 drivers/acpi/acpi_extlog.c              |   2 +-
 drivers/acpi/nfit/mce.c                 |   2 +-
 drivers/edac/i7core_edac.c              |   2 +-
 drivers/edac/igen6_edac.c               |   2 +-
 drivers/edac/mce_amd.c                  |  27 +++-
 drivers/edac/pnd2_edac.c                |   2 +-
 drivers/edac/sb_edac.c                  |   2 +-
 drivers/edac/skx_common.c               |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |   2 +-
 drivers/ras/amd/fmpm.c                  |   2 +-
 drivers/ras/cec.c                       |   2 +-
 include/trace/events/mce.h              |  51 ++++---
 19 files changed, 286 insertions(+), 164 deletions(-)

Comments

Borislav Petkov June 21, 2024, 4:58 p.m. UTC | #1
On Thu, May 30, 2024 at 04:16:16PM -0500, Avadhut Naik wrote:
>  arch/x86/include/asm/mce.h              |  20 ++-
>  arch/x86/kernel/cpu/mce/apei.c          | 111 ++++++++++----
>  arch/x86/kernel/cpu/mce/core.c          | 191 ++++++++++++++----------
>  arch/x86/kernel/cpu/mce/dev-mcelog.c    |   2 +-
>  arch/x86/kernel/cpu/mce/genpool.c       |  20 +--
>  arch/x86/kernel/cpu/mce/inject.c        |   4 +-
>  arch/x86/kernel/cpu/mce/internal.h      |   4 +-
>  drivers/acpi/acpi_extlog.c              |   2 +-
>  drivers/acpi/nfit/mce.c                 |   2 +-
>  drivers/edac/i7core_edac.c              |   2 +-
>  drivers/edac/igen6_edac.c               |   2 +-
>  drivers/edac/mce_amd.c                  |  27 +++-
>  drivers/edac/pnd2_edac.c                |   2 +-
>  drivers/edac/sb_edac.c                  |   2 +-
>  drivers/edac/skx_common.c               |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |   2 +-
>  drivers/ras/amd/fmpm.c                  |   2 +-
>  drivers/ras/cec.c                       |   2 +-
>  include/trace/events/mce.h              |  51 ++++---
>  19 files changed, 286 insertions(+), 164 deletions(-)

This doesn't apply anymore - please redo this ontop of the latest tip/master.

Thx.
Yazen Ghannam June 21, 2024, 8:01 p.m. UTC | #2
On Fri, Jun 21, 2024 at 06:58:23PM +0200, Borislav Petkov wrote:
> On Thu, May 30, 2024 at 04:16:16PM -0500, Avadhut Naik wrote:
> >  arch/x86/include/asm/mce.h              |  20 ++-
> >  arch/x86/kernel/cpu/mce/apei.c          | 111 ++++++++++----
> >  arch/x86/kernel/cpu/mce/core.c          | 191 ++++++++++++++----------
> >  arch/x86/kernel/cpu/mce/dev-mcelog.c    |   2 +-
> >  arch/x86/kernel/cpu/mce/genpool.c       |  20 +--
> >  arch/x86/kernel/cpu/mce/inject.c        |   4 +-
> >  arch/x86/kernel/cpu/mce/internal.h      |   4 +-
> >  drivers/acpi/acpi_extlog.c              |   2 +-
> >  drivers/acpi/nfit/mce.c                 |   2 +-
> >  drivers/edac/i7core_edac.c              |   2 +-
> >  drivers/edac/igen6_edac.c               |   2 +-
> >  drivers/edac/mce_amd.c                  |  27 +++-
> >  drivers/edac/pnd2_edac.c                |   2 +-
> >  drivers/edac/sb_edac.c                  |   2 +-
> >  drivers/edac/skx_common.c               |   2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |   2 +-
> >  drivers/ras/amd/fmpm.c                  |   2 +-
> >  drivers/ras/cec.c                       |   2 +-
> >  include/trace/events/mce.h              |  51 ++++---
> >  19 files changed, 286 insertions(+), 164 deletions(-)
> 
> This doesn't apply anymore - please redo this ontop of the latest tip/master.
>

Avadhut,

You can drop the dependencies on other sets. We can sort out any
conflicts as needed.

Thanks,
Yazen
Naik, Avadhut June 22, 2024, 2:24 a.m. UTC | #3
On 6/21/2024 15:01, Yazen Ghannam wrote:
> On Fri, Jun 21, 2024 at 06:58:23PM +0200, Borislav Petkov wrote:
>> On Thu, May 30, 2024 at 04:16:16PM -0500, Avadhut Naik wrote:
>>>  arch/x86/include/asm/mce.h              |  20 ++-
>>>  arch/x86/kernel/cpu/mce/apei.c          | 111 ++++++++++----
>>>  arch/x86/kernel/cpu/mce/core.c          | 191 ++++++++++++++----------
>>>  arch/x86/kernel/cpu/mce/dev-mcelog.c    |   2 +-
>>>  arch/x86/kernel/cpu/mce/genpool.c       |  20 +--
>>>  arch/x86/kernel/cpu/mce/inject.c        |   4 +-
>>>  arch/x86/kernel/cpu/mce/internal.h      |   4 +-
>>>  drivers/acpi/acpi_extlog.c              |   2 +-
>>>  drivers/acpi/nfit/mce.c                 |   2 +-
>>>  drivers/edac/i7core_edac.c              |   2 +-
>>>  drivers/edac/igen6_edac.c               |   2 +-
>>>  drivers/edac/mce_amd.c                  |  27 +++-
>>>  drivers/edac/pnd2_edac.c                |   2 +-
>>>  drivers/edac/sb_edac.c                  |   2 +-
>>>  drivers/edac/skx_common.c               |   2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |   2 +-
>>>  drivers/ras/amd/fmpm.c                  |   2 +-
>>>  drivers/ras/cec.c                       |   2 +-
>>>  include/trace/events/mce.h              |  51 ++++---
>>>  19 files changed, 286 insertions(+), 164 deletions(-)
>>
>> This doesn't apply anymore - please redo this ontop of the latest tip/master.
>>
> 
> Avadhut,
> 
> You can drop the dependencies on other sets. We can sort out any
> conflicts as needed.
> 
Sounds good! Will redo on top of tip/master and resubmit.

> Thanks,
> Yazen