diff mbox series

[v2,2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

Message ID 20240625195624.2565741-3-avadhut.naik@amd.com
State New
Headers show
Series MCE wrapper and support for new SMCA syndrome MSRs | expand

Commit Message

Avadhut Naik June 25, 2024, 7:56 p.m. UTC
AMD's Scalable MCA systems viz. Genoa will include two new registers:
MCA_SYND1 and MCA_SYND2.

These registers will include supplemental error information in addition
to the existing MCA_SYND register. The data within the registers is
considered valid if MCA_STATUS[SyndV] is set.

Add fields for these registers as vendor-specific error information
in struct mce_hw_err. Save and print these registers wherever
MCA_STATUS[SyndV]/MCA_SYND is currently used.

Also, modify the mce_record tracepoint to export these new registers
through __dynamic_array. While the sizeof() operator has been used to
determine the size of this __dynamic_array, the same, if needed in the
future can be substituted by caching the size of vendor-specific error
information as part of struct mce_hw_err.

Note: Checkpatch warnings/errors are ignored to maintain coding style.

[Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/include/asm/mce.h     | 12 ++++++++++++
 arch/x86/kernel/cpu/mce/amd.c  |  5 ++++-
 arch/x86/kernel/cpu/mce/core.c | 24 +++++++++++++++++-------
 drivers/edac/mce_amd.c         | 10 +++++++---
 include/trace/events/mce.h     |  9 +++++++--
 5 files changed, 47 insertions(+), 13 deletions(-)

Comments

Borislav Petkov June 26, 2024, 11:10 a.m. UTC | #1
On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
> AMD's Scalable MCA systems viz. Genoa will include two new registers:

"viz."?

Not a lot of people outside of AMD know what Genoa is. Zen4 is probably a lot
more widespread.

> MCA_SYND1 and MCA_SYND2.
> 
> These registers will include supplemental error information in addition
> to the existing MCA_SYND register. The data within the registers is
> considered valid if MCA_STATUS[SyndV] is set.
Naik, Avadhut June 26, 2024, 5:24 p.m. UTC | #2
On 6/26/2024 06:10, Borislav Petkov wrote:
> On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
>> AMD's Scalable MCA systems viz. Genoa will include two new registers:
> 
> "viz."?
> 
Right. Will mention Zen4 instead of Genoa.

> Not a lot of people outside of AMD know what Genoa is. Zen4 is probably a lot
> more widespread.
> 
>> MCA_SYND1 and MCA_SYND2.
>>
>> These registers will include supplemental error information in addition
>> to the existing MCA_SYND register. The data within the registers is
>> considered valid if MCA_STATUS[SyndV] is set.
> 
> From here...
> 
>> Add fields for these registers as vendor-specific error information
>> in struct mce_hw_err. Save and print these registers wherever
>> MCA_STATUS[SyndV]/MCA_SYND is currently used.
>>
>> Also, modify the mce_record tracepoint to export these new registers
>> through __dynamic_array. While the sizeof() operator has been used to
>> determine the size of this __dynamic_array, the same, if needed in the
>> future can be substituted by caching the size of vendor-specific error
>> information as part of struct mce_hw_err.
> 
> ... to here this text explains what the patch does. I guess it is time for my
> boilerplate text again:
> 
> Do not talk about *what* the patch is doing in the commit message - that
> should be obvious from the diff itself. Rather, concentrate on the *why*
> it needs to be done.
> 
> Imagine one fine day you're doing git archeology, you find the place in
> the code about which you want to find out why it was changed the way it 
> is now.
> 
> You do git annotate <filename> ... find the line, see the commit id and
> you do:
> 
> git show <commit id>
> 
> You read the commit message and there's just gibberish and nothing's
> explaining *why* that change was done. And you start scratching your
> head, trying to figure out why. Because the damn commit message is worth
> sh*t.
> 
> This happens to us maintainers at least once a week. Well, I don't want
> that to happen in my tree anymore.
> 
> You catch my drift? :)
> 
> So, now, how are those new syndromes going to be used in the tracepoint and
> why do we want them there?
> 
Yes, I catch your drift. Will reword the commit message to explain that the
new syndrome registers are going to be exported through the tracepoint
in a dynamic array, as they are vendor-specific, so that usersapce error
decoding tools can retrieve the supplemental error information within them.

>> Note: Checkpatch warnings/errors are ignored to maintain coding style.
> 
> This goes...
> 
>>
>> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
> 
> Yes, you did but now your SOB chain is wrong:
> 
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> This tells me Avadhut is the author, Yazen handled it and he's sending it to
> me. But nope, he isn't. So it needs another Avadhut SOB underneath.
> 
> Audit all patches pls.
> 
Wasn't aware of this chronology. Thanks for this information!
So, IIUC, the sequence for this patch should be as follows?

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>

>> ---
> 
> ... right under those three "---" as such notes do not belong in the commit
> message. Remember that for the future.
> 
Okay. Will move the note here.

>>  arch/x86/include/asm/mce.h     | 12 ++++++++++++
>>  arch/x86/kernel/cpu/mce/amd.c  |  5 ++++-
>>  arch/x86/kernel/cpu/mce/core.c | 24 +++++++++++++++++-------
>>  drivers/edac/mce_amd.c         | 10 +++++++---
>>  include/trace/events/mce.h     |  9 +++++++--
>>  5 files changed, 47 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index e955edb22897..2b43ba37bbda 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -122,6 +122,9 @@
>>  #define MSR_AMD64_SMCA_MC0_DESTAT	0xc0002008
>>  #define MSR_AMD64_SMCA_MC0_DEADDR	0xc0002009
>>  #define MSR_AMD64_SMCA_MC0_MISC1	0xc000200a
>> +/* Registers MISC2 to MISC4 are at offsets B to D. */
>> +#define MSR_AMD64_SMCA_MC0_SYND1	0xc000200e
>> +#define MSR_AMD64_SMCA_MC0_SYND2	0xc000200f
>>  #define MSR_AMD64_SMCA_MCx_CTL(x)	(MSR_AMD64_SMCA_MC0_CTL + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_STATUS(x)	(MSR_AMD64_SMCA_MC0_STATUS + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_ADDR(x)	(MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x))
>> @@ -132,6 +135,8 @@
>>  #define MSR_AMD64_SMCA_MCx_DESTAT(x)	(MSR_AMD64_SMCA_MC0_DESTAT + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_DEADDR(x)	(MSR_AMD64_SMCA_MC0_DEADDR + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_MISCy(x, y)	((MSR_AMD64_SMCA_MC0_MISC1 + y) + (0x10*(x)))
>> +#define MSR_AMD64_SMCA_MCx_SYND1(x)	(MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x))
>> +#define MSR_AMD64_SMCA_MCx_SYND2(x)	(MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x))
>>  
>>  #define XEC(x, mask)			(((x) >> 16) & mask)
>>  
>> @@ -189,6 +194,13 @@ enum mce_notifier_prios {
>>  
>>  struct mce_hw_err {
>>  	struct mce m;
>> +
>> +	union vendor_info {
>> +		struct {
>> +			u64 synd1;
>> +			u64 synd2;
>> +		} amd;
> 
> I presume the intent here is for Intel or other vendors to add their
> vendor-specific stuff here too?
> 
> I'm also expecting that shared fields will be promoted up to the common struct
> namespace. Pls add a short comment explaining what the goal with that struct
> is.
> 
Yes, other vendors can export their vendor-specific data through thier own
structure within the union. Yes, shared fields can be promoted to the common
structure. Will add a comment to explain the endgoal.

>> +	} vi;
> 
> Call that "vendor" so that in the code you can have
> 
> 	err.vendor.amd.
> 
> or
> 
> 	err.vendor.intel.
> 
> and so on so that it is perfectly clear what this is.
> 
Will do.
>>  };
>>  
>>  struct notifier_block;
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index cb7dc0b1aa50..fc69d244ca7f 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -799,8 +799,11 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
>>  	if (mce_flags.smca) {
>>  		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
>>  
>> -		if (m->status & MCI_STATUS_SYNDV)
>> +		if (m->status & MCI_STATUS_SYNDV) {
>>  			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
>> +			rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(bank), err.vi.amd.synd1);
>> +			rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(bank), err.vi.amd.synd2);
>> +		}
>>  	}
>>  
>>  	mce_log(&err);
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index 6225143b9b14..3bb0f8b39f97 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -189,6 +189,10 @@ static void __print_mce(struct mce_hw_err *err)
>>  	if (mce_flags.smca) {
>>  		if (m->synd)
>>  			pr_cont("SYND %llx ", m->synd);
>> +		if (err->vi.amd.synd1)
>> +			pr_cont("SYND1 %llx ", err->vi.amd.synd1);
>> +		if (err->vi.amd.synd2)
>> +			pr_cont("SYND2 %llx ", err->vi.amd.synd2);
>>  		if (m->ipid)
>>  			pr_cont("IPID %llx ", m->ipid);
>>  	}
>> @@ -639,8 +643,10 @@ static struct notifier_block mce_default_nb = {
>>  /*
>>   * Read ADDR and MISC registers.
>>   */
>> -static noinstr void mce_read_aux(struct mce *m, int i)
>> +static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
> 
> This whole conversion to struct mce_hw_err here belongs logically into patch
> 1.
> 
Had considered this. But struct mce_hw_err *err wouldn't really be used in
mce_read_aux() in patch 1. Only struct mce m, which is already available, will
be used.
Hence, deferred the change to this patch where usage of struct mce_hw_err *err
is actually introduced in mce_read_aux().

Do you prefer having this change in patch 1 instead?

>>  {
>> +	struct mce *m = &err->m;
>> +
>>  	if (m->status & MCI_STATUS_MISCV)
>>  		m->misc = mce_rdmsrl(mca_msr_reg(i, MCA_MISC));
>>  
>> @@ -662,8 +668,11 @@ static noinstr void mce_read_aux(struct mce *m, int i)
>>  	if (mce_flags.smca) {
>>  		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
>>  
>> -		if (m->status & MCI_STATUS_SYNDV)
>> +		if (m->status & MCI_STATUS_SYNDV) {
>>  			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
>> +			err->vi.amd.synd1 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(i));
>> +			err->vi.amd.synd2 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(i));
>> +		}
>>  	}
>>  }
>>  
>> @@ -766,7 +775,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>>  		if (flags & MCP_DONTLOG)
>>  			goto clear_it;
>>  
>> -		mce_read_aux(m, i);
>> +		mce_read_aux(&err, i);
>>  		m->severity = mce_severity(m, NULL, NULL, false);
>>  		/*
>>  		 * Don't get the IP here because it's unlikely to
>> @@ -903,9 +912,10 @@ static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_reg
>>   * Do a quick check if any of the events requires a panic.
>>   * This decides if we keep the events around or clear them.
>>   */
>> -static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
>> +static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp,
>>  					  struct pt_regs *regs)
>>  {
>> +	struct mce *m = &err->m;
>>  	char *tmp = *msg;
>>  	int i;
>>  
>> @@ -923,7 +933,7 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
>>  
>>  		m->bank = i;
>>  		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
>> -			mce_read_aux(m, i);
>> +			mce_read_aux(err, i);
>>  			*msg = tmp;
>>  			return 1;
>>  		}
>> @@ -1321,7 +1331,7 @@ __mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs, struct mce *final,
>>  		if (severity == MCE_NO_SEVERITY)
>>  			continue;
>>  
>> -		mce_read_aux(m, i);
>> +		mce_read_aux(err, i);
>>  
>>  		/* assuming valid severity level != 0 */
>>  		m->severity = severity;
>> @@ -1522,7 +1532,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>>  	final = this_cpu_ptr(&hw_errs_seen);
>>  	final->m = *m;
>>  
>> -	no_way_out = mce_no_way_out(m, &msg, valid_banks, regs);
>> +	no_way_out = mce_no_way_out(&err, &msg, valid_banks, regs);
>>  
>>  	barrier();
>>  
>> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
>> index c5fae99de781..69e12cb2f0de 100644
>> --- a/drivers/edac/mce_amd.c
>> +++ b/drivers/edac/mce_amd.c
>> @@ -792,7 +792,8 @@ static const char *decode_error_status(struct mce *m)
>>  static int
>>  amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>>  {
>> -	struct mce *m = &((struct mce_hw_err *)data)->m;
>> +	struct mce_hw_err *err = (struct mce_hw_err *)data;
>> +	struct mce *m = &err->m;
>>  	unsigned int fam = x86_family(m->cpuid);
>>  	int ecc;
>>  
>> @@ -850,8 +851,11 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>>  	if (boot_cpu_has(X86_FEATURE_SMCA)) {
>>  		pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid);
>>  
>> -		if (m->status & MCI_STATUS_SYNDV)
>> -			pr_cont(", Syndrome: 0x%016llx", m->synd);
>> +		if (m->status & MCI_STATUS_SYNDV) {
>> +			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
>> +			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
>> +				 err->vi.amd.synd1, err->vi.amd.synd2);
>> +		}
>>  
>>  		pr_cont("\n");
>>  
>> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
>> index 65aba1afcd07..9e7211eddbca 100644
>> --- a/include/trace/events/mce.h
>> +++ b/include/trace/events/mce.h
>> @@ -43,6 +43,8 @@ TRACE_EVENT(mce_record,
>>  		__field(	u8,		bank		)
>>  		__field(	u8,		cpuvendor	)
>>  		__field(	u32,		microcode	)
>> +		__field(	u8,		len	)
>> +		__dynamic_array(u8, v_data, sizeof(err->vi))
>>  	),
>>  
>>  	TP_fast_assign(
>> @@ -65,9 +67,11 @@ TRACE_EVENT(mce_record,
>>  		__entry->bank		= err->m.bank;
>>  		__entry->cpuvendor	= err->m.cpuvendor;
>>  		__entry->microcode	= err->m.microcode;
>> +		__entry->len		= sizeof(err->vi);
>> +		memcpy(__get_dynamic_array(v_data), &err->vi, sizeof(err->vi));
> 
> So that vendor data layout - is that ABI too? Or are we free to shuffle the
> fields around in the future or even remove some?
> 
> This all needs to be specified somewhere explicitly so that nothing relies on
> that layout.
> 
> And I'm not sure that that's enough because when userspace tools start using
> them, then they're practically an ABI so you can't change them even if you
> wanted to.
> 
> So is libtraceevent or all the other libraries going to parse this as a blob
> and it is always going to remain such?
> 
> But then the tools which interpret it need to know its layout and if it
> changes, perhaps check kernel version which then becomes RealUgly(tm).
> 
> So you might just as well dump the separate fields one by one, without
> a dynamic array.
> 
> Or do a dynamic array but specify that their layout in struct
> mce_hw_er.vendor.amd are cast in stone so that we're all clear on what goes
> where.
> 
> Questions over questions...
> 
Should we document this where struct mce_hw_err is defined, in
arch/x86/include/asm/mce.h? Or do you have any other recommendations?

IIUC, the libtraceevent library relies on tracepoint's format in tracefs. Below
is the format with this patchset incorporated.

[root avadnaik]# cat /sys/kernel/debug/tracing/events/mce/mce_record/format 
name: mce_record
ID: 113
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:u64 mcgcap;       offset:8;       size:8; signed:0;
        field:u64 mcgstatus;    offset:16;      size:8; signed:0;
        field:u64 status;       offset:24;      size:8; signed:0;
        field:u64 addr; offset:32;      size:8; signed:0;
        field:u64 misc; offset:40;      size:8; signed:0;
        field:u64 synd; offset:48;      size:8; signed:0;
        field:u64 ipid; offset:56;      size:8; signed:0;
        field:u64 ip;   offset:64;      size:8; signed:0;
        field:u64 tsc;  offset:72;      size:8; signed:0;
        field:u64 ppin; offset:80;      size:8; signed:0;
        field:u64 walltime;     offset:88;      size:8; signed:0;
        field:u32 cpu;  offset:96;      size:4; signed:0;
        field:u32 cpuid;        offset:100;     size:4; signed:0;
        field:u32 apicid;       offset:104;     size:4; signed:0;
        field:u32 socketid;     offset:108;     size:4; signed:0;
        field:u8 cs;    offset:112;     size:1; signed:0;
        field:u8 bank;  offset:113;     size:1; signed:0;
        field:u8 cpuvendor;     offset:114;     size:1; signed:0;
        field:u32 microcode;    offset:116;     size:4; signed:0;
        field:u8 len;   offset:120;     size:1; signed:0;
        field:__data_loc u8[] v_data;   offset:124;     size:4; signed:0;

print fmt: "CPU: %d, MCGc/s: %llx/%llx, MC%d: %016llx, IPID: %016llx, ADDR: %016llx, MISC: %016llx, SYND: %016llx, RIP: %02x:<%016llx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x, vendor data: %s", REC->cpu, REC->mcgcap, REC->mcgstatus, REC->bank, REC->status, REC->ipid, REC->addr, REC->misc, REC->synd, REC->cs, REC->ip, REC->tsc, REC->ppin, REC->cpuvendor, REC->cpuid, REC->walltime, REC->socketid, REC->apicid, REC->microcode, __print_array(__get_dynamic_array(v_data), REC->len / 8, 8)

So, yes, the tools which interpret the vendor data need to aware of its layout
if things like FRUTEXT are to be decoded from the data.

Just FYI, patch adding support for this in rasdaemon, has already been merged in.
https://github.com/mchehab/rasdaemon/pull/122/commits/926c2b39c6386d0a1bf4232977f9fd7e37850361

> Thx.
>
Borislav Petkov June 26, 2024, 6:18 p.m. UTC | #3
On Wed, Jun 26, 2024 at 12:24:20PM -0500, Naik, Avadhut wrote:
> 
> 
> On 6/26/2024 06:10, Borislav Petkov wrote:
> > On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
> >> AMD's Scalable MCA systems viz. Genoa will include two new registers:
> > 
> > "viz."?
> > 
> Right. Will mention Zen4 instead of Genoa.

I still don't know what "viz." means...

> Yes, I catch your drift. Will reword the commit message to explain that the
> new syndrome registers are going to be exported through the tracepoint
> in a dynamic array, as they are vendor-specific, so that usersapce error
> decoding tools can retrieve the supplemental error information within them.

Again, why?

Why is it important to have them in the tracepoint?

> >> Note: Checkpatch warnings/errors are ignored to maintain coding style.
> > 
> > This goes...
> > 
> >>
> >> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
> > 
> > Yes, you did but now your SOB chain is wrong:
> > 
> >> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> >> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> > 
> > This tells me Avadhut is the author, Yazen handled it and he's sending it to
> > me. But nope, he isn't. So it needs another Avadhut SOB underneath.
> > 
> > Audit all patches pls.
> > 
> Wasn't aware of this chronology. Thanks for this information!

Well, there's documentation for that which you should've read already, before
sending patches:

https://kernel.org/doc/html/latest/process/development-process.html

and

https://kernel.org/doc/html/latest/process/submitting-patches.html

especially.

> So, IIUC, the sequence for this patch should be as follows?
> 
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>

Yes, now I leave it to you to explain why. Hint: it is in those docs above.

> 
> >> ---
> > 
> > ... right under those three "---" as such notes do not belong in the commit
> > message. Remember that for the future.
> > 
> Okay. Will move the note here.

Or remove it completely. checkpatch is crap - I know. No need to have it in
every patch.

> Had considered this. But struct mce_hw_err *err wouldn't really be used in
> mce_read_aux() in patch 1. Only struct mce m, which is already available, will
> be used.

So?

> Hence, deferred the change to this patch where usage of struct mce_hw_err *err
> is actually introduced in mce_read_aux().
> 
> Do you prefer having this change in patch 1 instead?

I prefer a patch to contain one logical and complete change only. Because this
makes review easier. You should try reviewing patches sometimes too and you'll
know.

> > So that vendor data layout - is that ABI too? Or are we free to shuffle the
> > fields around in the future or even remove some?
> > 
> > This all needs to be specified somewhere explicitly so that nothing relies on
> > that layout.
> > 
> > And I'm not sure that that's enough because when userspace tools start using
> > them, then they're practically an ABI so you can't change them even if you
> > wanted to.
> > 
> > So is libtraceevent or all the other libraries going to parse this as a blob
> > and it is always going to remain such?
> > 
> > But then the tools which interpret it need to know its layout and if it
> > changes, perhaps check kernel version which then becomes RealUgly(tm).
> > 
> > So you might just as well dump the separate fields one by one, without
> > a dynamic array.
> > 
> > Or do a dynamic array but specify that their layout in struct
> > mce_hw_er.vendor.amd are cast in stone so that we're all clear on what goes
> > where.
> > 
> > Questions over questions...
> > 
> Should we document this where struct mce_hw_err is defined, in
> arch/x86/include/asm/mce.h? Or do you have any other recommendations?

I don't know. If I knew I wouldn't have questions which you can read again and
try to answer.
Naik, Avadhut July 9, 2024, 6:27 a.m. UTC | #4
On 6/26/2024 13:18, Borislav Petkov wrote:
> On Wed, Jun 26, 2024 at 12:24:20PM -0500, Naik, Avadhut wrote:
>>
>>
>> On 6/26/2024 06:10, Borislav Petkov wrote:
>>> On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
>>>> AMD's Scalable MCA systems viz. Genoa will include two new registers:
>>>
>>> "viz."?
>>>
>> Right. Will mention Zen4 instead of Genoa.
> 
> I still don't know what "viz." means...
> 
IIUC, its an abbreviation of a Latin word and is used as a synonym for "namely"
or "that is to say".
Might not be the best choice in this case. Will change it.

>> Yes, I catch your drift. Will reword the commit message to explain that the
>> new syndrome registers are going to be exported through the tracepoint
>> in a dynamic array, as they are vendor-specific, so that usersapce error
>> decoding tools can retrieve the supplemental error information within them.
> 
> Again, why?
> 
> Why is it important to have them in the tracepoint?
> 
Userspace error decoding tools like the rasdaemon gather related hardware error
information through the tracepoints. As such, its important to have these two
registers in the tracepoint so that the tools like rasdaemon can parse them
and output the supplemental error information like FRU Text contained in them.
Yes, the registers are also being outputted thorough the dmesg but printk messages
are not an ABI.
The proper way to export these registers is through the tracepoint.

>>>> Note: Checkpatch warnings/errors are ignored to maintain coding style.
>>>
>>> This goes...
>>>
>>>>
>>>> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
>>>
>>> Yes, you did but now your SOB chain is wrong:
>>>
>>>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>>>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>>>
>>> This tells me Avadhut is the author, Yazen handled it and he's sending it to
>>> me. But nope, he isn't. So it needs another Avadhut SOB underneath.
>>>
>>> Audit all patches pls.
>>>
>> Wasn't aware of this chronology. Thanks for this information!
> 
> Well, there's documentation for that which you should've read already, before
> sending patches:
> 
> https://kernel.org/doc/html/latest/process/development-process.html
> 
> and
> 
> https://kernel.org/doc/html/latest/process/submitting-patches.html
> 
> especially.
> 
>> So, IIUC, the sequence for this patch should be as follows?
>>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> 
> Yes, now I leave it to you to explain why. Hint: it is in those docs above.
> 
Got it. The first SoB entry is of the primary author. The successive SoB's are
from the people handling and transporting the patch.
IOW, the route taken by a patch, as its propagated, to maintainers and eventually
to Linus, should be evident from the SoB chain.

>>
>>>> ---
>>>
>>> ... right under those three "---" as such notes do not belong in the commit
>>> message. Remember that for the future.
>>>
>> Okay. Will move the note here.
> 
> Or remove it completely. checkpatch is crap - I know. No need to have it in
> every patch.
> 
Okay. Will remove the note altogether.
It's also present in the commit descriptions of other patches in this set.
Will remove from there as well.

>> Had considered this. But struct mce_hw_err *err wouldn't really be used in
>> mce_read_aux() in patch 1. Only struct mce m, which is already available, will
>> be used.
> 
> So?
> 
>> Hence, deferred the change to this patch where usage of struct mce_hw_err *err
>> is actually introduced in mce_read_aux().
>>
>> Do you prefer having this change in patch 1 instead?
> 
> I prefer a patch to contain one logical and complete change only. Because this
> makes review easier. You should try reviewing patches sometimes too and you'll
> know.
> 
Understood. Will move this to patch1.

>>> So that vendor data layout - is that ABI too? Or are we free to shuffle the
>>> fields around in the future or even remove some?
>>>
>>> This all needs to be specified somewhere explicitly so that nothing relies on
>>> that layout.
>>>
>>> And I'm not sure that that's enough because when userspace tools start using
>>> them, then they're practically an ABI so you can't change them even if you
>>> wanted to.
>>>
>>> So is libtraceevent or all the other libraries going to parse this as a blob
>>> and it is always going to remain such?
>>>
>>> But then the tools which interpret it need to know its layout and if it
>>> changes, perhaps check kernel version which then becomes RealUgly(tm).
>>>
>>> So you might just as well dump the separate fields one by one, without
>>> a dynamic array.
>>>
>>> Or do a dynamic array but specify that their layout in struct
>>> mce_hw_er.vendor.amd are cast in stone so that we're all clear on what goes
>>> where.
>>>
>>> Questions over questions...
>>>
>> Should we document this where struct mce_hw_err is defined, in
>> arch/x86/include/asm/mce.h? Or do you have any other recommendations?
> 
> I don't know. If I knew I wouldn't have questions which you can read again and
> try to answer.
> 
IIUC, at least for now, the libtraceevent library parses the entire vendor data
array as a blob. Rather, a pointer to the array in the raw tracepoint record along
with its length is returned by the library's tep_get_field_raw() API.

This very API has been used for implementing support for these registers and FRU
Text in the rasdaemon.

https://github.com/mchehab/rasdaemon/pull/122

Thus, the position of the array within the tracepoint and its length can be changed
in the future.

Its layout however, is a completely different matter. At least for AMD, it shouldn't
be changed. New fields, if any, should be added at the end.

The underlying reason for this is the FRU text feature.

With this set, the first two elements of the vendor data dynamic array are SYND 1/2
registers while the third element is MCA_CONFIG (added through patch 4 of the set).
Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are interpreted
as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set.

Thus, we depend on array's layout for accurate FRU Text decoding in the rasdaemon.

Hope this answers some of your questions!
Borislav Petkov July 10, 2024, 9:38 a.m. UTC | #5
On Tue, Jul 09, 2024 at 01:27:25AM -0500, Naik, Avadhut wrote:
> IIUC, its an abbreviation of a Latin word and is used as a synonym for "namely"
> or "that is to say".
> Might not be the best choice in this case. Will change it.

I learn new stuff every day:

https://en.wikipedia.org/wiki/Viz.

> Userspace error decoding tools like the rasdaemon gather related hardware error
> information through the tracepoints. As such, its important to have these two
> registers in the tracepoint so that the tools like rasdaemon can parse them
> and output the supplemental error information like FRU Text contained in them.

Put *that* in the commit message - do not explain what the patch does.

> Got it. The first SoB entry is of the primary author. The successive SoB's are
> from the people handling and transporting the patch.

Exactly!

> IOW, the route taken by a patch, as its propagated, to maintainers and eventually
> to Linus, should be evident from the SoB chain.

You got it.

> With this set, the first two elements of the vendor data dynamic array are SYND 1/2
> registers while the third element is MCA_CONFIG (added through patch 4 of the set).
> Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are interpreted
> as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set.
> 
> Thus, we depend on array's layout for accurate FRU Text decoding in the rasdaemon.

So it sounds to me like we want to document and thus freeze the
vendor-specific blob layout because tools are going to be using and parsing
it. And this will spare us the kernel version checks.

And new additions to that AMD-specific blob will come at the end and will
have to be documented too.

That sounds like an ok compromise to me...

Thx.
Naik, Avadhut July 10, 2024, 10:59 p.m. UTC | #6
On 7/10/2024 04:38, Borislav Petkov wrote:
> On Tue, Jul 09, 2024 at 01:27:25AM -0500, Naik, Avadhut wrote:
> 
>> Userspace error decoding tools like the rasdaemon gather related hardware error
>> information through the tracepoints. As such, its important to have these two
>> registers in the tracepoint so that the tools like rasdaemon can parse them
>> and output the supplemental error information like FRU Text contained in them.
> 
> Put *that* in the commit message - do not explain what the patch does.
> 
Will do.
 
>> With this set, the first two elements of the vendor data dynamic array are SYND 1/2
>> registers while the third element is MCA_CONFIG (added through patch 4 of the set).
>> Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are interpreted
>> as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set.
>>
>> Thus, we depend on array's layout for accurate FRU Text decoding in the rasdaemon.
> 
> So it sounds to me like we want to document and thus freeze the
> vendor-specific blob layout because tools are going to be using and parsing
> it. And this will spare us the kernel version checks.
> 
> And new additions to that AMD-specific blob will come at the end and will
> have to be documented too.
> 
> That sounds like an ok compromise to me...
> 
> Thx.
>
Sounds good!
Is it okay to document this where the new wrapper and vendor-specific data
structures are being defined, in arch/x86/include/asm/mce.h?
Similar approach has been taken for struct mce.
Or do you have any other recommendations?
diff mbox series

Patch

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e955edb22897..2b43ba37bbda 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -122,6 +122,9 @@ 
 #define MSR_AMD64_SMCA_MC0_DESTAT	0xc0002008
 #define MSR_AMD64_SMCA_MC0_DEADDR	0xc0002009
 #define MSR_AMD64_SMCA_MC0_MISC1	0xc000200a
+/* Registers MISC2 to MISC4 are at offsets B to D. */
+#define MSR_AMD64_SMCA_MC0_SYND1	0xc000200e
+#define MSR_AMD64_SMCA_MC0_SYND2	0xc000200f
 #define MSR_AMD64_SMCA_MCx_CTL(x)	(MSR_AMD64_SMCA_MC0_CTL + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_STATUS(x)	(MSR_AMD64_SMCA_MC0_STATUS + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_ADDR(x)	(MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x))
@@ -132,6 +135,8 @@ 
 #define MSR_AMD64_SMCA_MCx_DESTAT(x)	(MSR_AMD64_SMCA_MC0_DESTAT + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_DEADDR(x)	(MSR_AMD64_SMCA_MC0_DEADDR + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_MISCy(x, y)	((MSR_AMD64_SMCA_MC0_MISC1 + y) + (0x10*(x)))
+#define MSR_AMD64_SMCA_MCx_SYND1(x)	(MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_SYND2(x)	(MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x))
 
 #define XEC(x, mask)			(((x) >> 16) & mask)
 
@@ -189,6 +194,13 @@  enum mce_notifier_prios {
 
 struct mce_hw_err {
 	struct mce m;
+
+	union vendor_info {
+		struct {
+			u64 synd1;
+			u64 synd2;
+		} amd;
+	} vi;
 };
 
 struct notifier_block;
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index cb7dc0b1aa50..fc69d244ca7f 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -799,8 +799,11 @@  static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 	if (mce_flags.smca) {
 		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
 
-		if (m->status & MCI_STATUS_SYNDV)
+		if (m->status & MCI_STATUS_SYNDV) {
 			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
+			rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(bank), err.vi.amd.synd1);
+			rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(bank), err.vi.amd.synd2);
+		}
 	}
 
 	mce_log(&err);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 6225143b9b14..3bb0f8b39f97 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -189,6 +189,10 @@  static void __print_mce(struct mce_hw_err *err)
 	if (mce_flags.smca) {
 		if (m->synd)
 			pr_cont("SYND %llx ", m->synd);
+		if (err->vi.amd.synd1)
+			pr_cont("SYND1 %llx ", err->vi.amd.synd1);
+		if (err->vi.amd.synd2)
+			pr_cont("SYND2 %llx ", err->vi.amd.synd2);
 		if (m->ipid)
 			pr_cont("IPID %llx ", m->ipid);
 	}
@@ -639,8 +643,10 @@  static struct notifier_block mce_default_nb = {
 /*
  * Read ADDR and MISC registers.
  */
-static noinstr void mce_read_aux(struct mce *m, int i)
+static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 {
+	struct mce *m = &err->m;
+
 	if (m->status & MCI_STATUS_MISCV)
 		m->misc = mce_rdmsrl(mca_msr_reg(i, MCA_MISC));
 
@@ -662,8 +668,11 @@  static noinstr void mce_read_aux(struct mce *m, int i)
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
 
-		if (m->status & MCI_STATUS_SYNDV)
+		if (m->status & MCI_STATUS_SYNDV) {
 			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
+			err->vi.amd.synd1 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(i));
+			err->vi.amd.synd2 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(i));
+		}
 	}
 }
 
@@ -766,7 +775,7 @@  void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		if (flags & MCP_DONTLOG)
 			goto clear_it;
 
-		mce_read_aux(m, i);
+		mce_read_aux(&err, i);
 		m->severity = mce_severity(m, NULL, NULL, false);
 		/*
 		 * Don't get the IP here because it's unlikely to
@@ -903,9 +912,10 @@  static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_reg
  * Do a quick check if any of the events requires a panic.
  * This decides if we keep the events around or clear them.
  */
-static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
+static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp,
 					  struct pt_regs *regs)
 {
+	struct mce *m = &err->m;
 	char *tmp = *msg;
 	int i;
 
@@ -923,7 +933,7 @@  static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
 
 		m->bank = i;
 		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
-			mce_read_aux(m, i);
+			mce_read_aux(err, i);
 			*msg = tmp;
 			return 1;
 		}
@@ -1321,7 +1331,7 @@  __mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs, struct mce *final,
 		if (severity == MCE_NO_SEVERITY)
 			continue;
 
-		mce_read_aux(m, i);
+		mce_read_aux(err, i);
 
 		/* assuming valid severity level != 0 */
 		m->severity = severity;
@@ -1522,7 +1532,7 @@  noinstr void do_machine_check(struct pt_regs *regs)
 	final = this_cpu_ptr(&hw_errs_seen);
 	final->m = *m;
 
-	no_way_out = mce_no_way_out(m, &msg, valid_banks, regs);
+	no_way_out = mce_no_way_out(&err, &msg, valid_banks, regs);
 
 	barrier();
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index c5fae99de781..69e12cb2f0de 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -792,7 +792,8 @@  static const char *decode_error_status(struct mce *m)
 static int
 amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 {
-	struct mce *m = &((struct mce_hw_err *)data)->m;
+	struct mce_hw_err *err = (struct mce_hw_err *)data;
+	struct mce *m = &err->m;
 	unsigned int fam = x86_family(m->cpuid);
 	int ecc;
 
@@ -850,8 +851,11 @@  amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
 		pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid);
 
-		if (m->status & MCI_STATUS_SYNDV)
-			pr_cont(", Syndrome: 0x%016llx", m->synd);
+		if (m->status & MCI_STATUS_SYNDV) {
+			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
+			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
+				 err->vi.amd.synd1, err->vi.amd.synd2);
+		}
 
 		pr_cont("\n");
 
diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index 65aba1afcd07..9e7211eddbca 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -43,6 +43,8 @@  TRACE_EVENT(mce_record,
 		__field(	u8,		bank		)
 		__field(	u8,		cpuvendor	)
 		__field(	u32,		microcode	)
+		__field(	u8,		len	)
+		__dynamic_array(u8, v_data, sizeof(err->vi))
 	),
 
 	TP_fast_assign(
@@ -65,9 +67,11 @@  TRACE_EVENT(mce_record,
 		__entry->bank		= err->m.bank;
 		__entry->cpuvendor	= err->m.cpuvendor;
 		__entry->microcode	= err->m.microcode;
+		__entry->len		= sizeof(err->vi);
+		memcpy(__get_dynamic_array(v_data), &err->vi, sizeof(err->vi));
 	),
 
-	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR: %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x",
+	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016llx, IPID: %016llx, ADDR: %016llx, MISC: %016llx, SYND: %016llx, RIP: %02x:<%016llx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x, vendor data: %s",
 		__entry->cpu,
 		__entry->mcgcap, __entry->mcgstatus,
 		__entry->bank, __entry->status,
@@ -83,7 +87,8 @@  TRACE_EVENT(mce_record,
 		__entry->walltime,
 		__entry->socketid,
 		__entry->apicid,
-		__entry->microcode)
+		__entry->microcode,
+		__print_array(__get_dynamic_array(v_data), __entry->len / 8, 8))
 );
 
 #endif /* _TRACE_MCE_H */