Message ID | 20250506213814.2365788-10-zaidal@os.amperecomputing.com |
---|---|
State | New |
Headers | show |
Series | Enable EINJv2 Support | expand |
On Tue, May 06, 2025 at 02:38:13PM -0700, Zaid Alali wrote: > + # echo 0x12345000 > param1 # Set memory address for injection > + # echo 0xfffffffffffff000 > param2 # Range - anywhere in this page > + # comp_arr="0x1 0x2 # Fill in the component array > + >0x1 0x4 > + >0x2 0x4" > + # echo "$comp_arr" > einjv2_component_array Seems complex (and may confuse people as the ">" in the lines setting up the comp_arr are secondary prompts from bash, not part if the input). If they miss the "" around $comp_arr in the last line they will get all the values on one line which will be rejected with -EINVAL during injection. This works better (and is shorter too!): # echo -e '0x1 0x2\n0x1 0x4\n0x2 0x4\n\0' > einjv2_component_array I think explicitly terminating the input with '\0' is needed (and that the kernel should NOT zero out the einjv2_component_array blob on each injection. That's unlike the other einj paramaters which are "sticky". The user can repeat the same injection without resetting all the parameters each time, just "echo 1 > error_inject" to do the same thing again. -Tony
On Tue, May 06, 2025 at 02:38:13PM -0700, Zaid Alali wrote: > +- einjv2_component_array > + > + The contents of this file are used to set the "Component Array" field > + of the EINJv2 Extension Structure. The expected format is hex values > + for component id and syndrome separated by space, and multiple > + components are separated by new line. How is this a good design? Do you guys not see that the other injection files are one value per file? > + # comp_arr="0x1 0x2 # Fill in the component array > + >0x1 0x4 > + >0x2 0x4" > + # echo "$comp_arr" > einjv2_component_array Oh boy. Srsly?!
On Fri, May 30, 2025 at 12:27:11PM +0200, Borislav Petkov wrote: > On Tue, May 06, 2025 at 02:38:13PM -0700, Zaid Alali wrote: > > +- einjv2_component_array > > + > > + The contents of this file are used to set the "Component Array" field > > + of the EINJv2 Extension Structure. The expected format is hex values > > + for component id and syndrome separated by space, and multiple > > + components are separated by new line. > > How is this a good design? > > Do you guys not see that the other injection files are one value per file? > > > + # comp_arr="0x1 0x2 # Fill in the component array > > + >0x1 0x4 > > + >0x2 0x4" > > + # echo "$comp_arr" > einjv2_component_array > > Oh boy. Srsly?! I've been staring at the debugfs blob used for einjv2_component_array to try and come up with some sane way to use it ... but I think it is a lost cause and I agree we need "one file, one value" like the rest of the EINJ user interface. I poked at the code a bit and mangled it into the patch below. I've tested that the new files read/write as expected. But I don't have an EINJV2 enabled system to run a full test. New files in the einj directory (assuming the system reports that it supports up to four simultaneous injections): -rw-------. 1 root root 0 May 30 13:26 component_id0 -rw-------. 1 root root 0 May 30 13:26 component_id1 -rw-------. 1 root root 0 May 30 13:26 component_id2 -rw-------. 1 root root 0 May 30 13:26 component_id3 -rw-------. 1 root root 0 May 30 13:26 component_syndrome0 -rw-------. 1 root root 0 May 30 13:26 component_syndrome1 -rw-------. 1 root root 0 May 30 13:26 component_syndrome2 -rw-------. 1 root root 0 May 30 13:26 component_syndrome3 -r--------. 1 root root 0 May 30 13:26 max_nr_components -rw-------. 1 root root 0 May 30 13:26 nr_components Use case to inject to one device would be: # echo 1 > nr_components # echo 4 > component_id0 # echo A5A5A5A5 > component_syndrome0 ... set other files and finish with usual # echo 1 > error_inject There isn't a fancy "debugfs_create_x128_le()" helper to manage these 128-bit little endian numbers. So I've coded with the basic building blocks (though using copy_from_user() and copy_to_user() feels like back in the stone age). If there some helpers that I missed I'd be happy to see that part simplified. Patch is on top of the existing v7 set. Obviously it needs to be folded back into the earlier patches to make a clean history that doesn't add functions and then replace them with different code. -Tony --- diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c index ab3d20b51ff1..9f29fc97f6a6 100644 --- a/drivers/acpi/apei/einj-core.c +++ b/drivers/acpi/apei/einj-core.c @@ -65,7 +65,7 @@ struct syndrome_array { u8 pcie_synd[16]; u8 vendor_synd[16]; } comp_synd; -}; +} *syndrome_data; struct einjv2_extension_struct { u32 length; @@ -117,9 +117,7 @@ static struct debugfs_blob_wrapper vendor_blob; static struct debugfs_blob_wrapper vendor_errors; static char vendor_dev[64]; -static struct debugfs_blob_wrapper einjv2_component_arr; -static void *user_input; -static int nr_components; +static u32 max_nr_components, nr_components; static u32 available_error_type; static u32 available_error_type_v2; @@ -327,7 +325,7 @@ static void __iomem *einj_get_parameter_address(void) if (available_error_type & ACPI65_EINJV2_SUPP) { len = v5param.einjv2_struct.length; offset = offsetof(struct einjv2_extension_struct, component_arr); - nr_components = (len - offset) / + max_nr_components = (len - offset) / sizeof(v5param.einjv2_struct.component_arr[0]); /* * The first call to acpi_os_map_iomem above does not include the @@ -338,7 +336,7 @@ static void __iomem *einj_get_parameter_address(void) acpi_os_unmap_iomem(p, v5param_size); offset = offsetof(struct set_error_type_with_address, einjv2_struct); v5param_size = offset + struct_size(&v5param.einjv2_struct, - component_arr, nr_components); + component_arr, max_nr_components); p = acpi_os_map_iomem(pa_v5, v5param_size); } return p; @@ -518,104 +516,6 @@ static int __einj_error_trigger(u64 trigger_paddr, u32 type, return rc; } -static int parse_hex_to_u8(char *str, u8 *arr) -{ - char *ptr, val[32]; - int pad, str_len; - - - if (str[0] == '0' && (str[1] == 'x' || str[1] == 'X')) - str += 2; - - str_len = strlen(str); - if (str_len > 32) - return -EINVAL; - - memcpy(val, str, str_len); - - ptr = val; - while (*ptr != '\0') { - if (!isxdigit(*ptr)) - return -EINVAL; - ptr++; - } - - pad = 32 - str_len; - - memmove(val + pad, val, str_len); - memset(val, '0', pad); - - for (int i = 0; i < COMPONENT_LEN; ++i) { - char byte_str[3] = {val[i * 2], val[i * 2 + 1], '\0'}; - /* write bytes in little endian format to follow ACPI specs */ - arr[COMPONENT_LEN - i - 1] = (u8)strtoul(byte_str, NULL, 16); - } - - return 0; -} - -static int read_component_array(struct set_error_type_with_address *v5param) -{ - int count = 0, str_len; - u8 comp_arr[COMPONENT_LEN], synd_arr[COMPONENT_LEN]; - struct syndrome_array *component_arr; - char *tok, *comp_str, *synd_str, *user; - - component_arr = v5param->einjv2_struct.component_arr; - str_len = strlen(user_input); - user = user_input; - user[str_len - 1] = '\0'; - while ((tok = strsep(&user, "\n")) != NULL) { - if (count >= nr_components) - return -EINVAL; - - comp_str = strsep(&tok, " \t"); - synd_str = strsep(&tok, " \t"); - - if (!comp_str || !synd_str) - return -EINVAL; - - if (parse_hex_to_u8(comp_str, comp_arr)) - return -EINVAL; - if (parse_hex_to_u8(synd_str, synd_arr)) - return -EINVAL; - - switch (v5param->type) { - case EINJV2_PROCESSOR_ERROR: - for (int i = 0; i < COMPONENT_LEN; ++i) { - component_arr[count].comp_id.acpi_id[i] = comp_arr[i]; - component_arr[count].comp_synd.proc_synd[i] = synd_arr[i]; - } - break; - case EINJV2_MEMORY_ERROR: - for (int i = 0; i < COMPONENT_LEN; ++i) { - component_arr[count].comp_id.device_id[i] = comp_arr[i]; - component_arr[count].comp_synd.mem_synd[i] = synd_arr[i]; - } - break; - case EINJV2_PCIE_ERROR: - for (int i = 0; i < COMPONENT_LEN; ++i) { - component_arr[count].comp_id.pcie_sbdf[i] = comp_arr[i]; - component_arr[count].comp_synd.pcie_synd[i] = synd_arr[i]; - } - break; - case EINJV2_VENDOR_ERROR: - for (int i = 0; i < COMPONENT_LEN; ++i) { - component_arr[count].comp_id.vendor_id[i] = comp_arr[i]; - component_arr[count].comp_synd.vendor_synd[i] = synd_arr[i]; - } - break; - } - count++; - - } - v5param->einjv2_struct.component_arr_count = count; - - /* clear buffer after user input for next injection */ - memset(user_input, 0, COMP_ARR_SIZE); - return 0; -} - static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3, u64 param4) { @@ -655,13 +555,17 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, v5param->memory_address_range = param2; if (is_V2) { - rc = read_component_array(v5param); - if (rc) { + if (nr_components < 1 || nr_components > max_nr_components) { kfree(v5param); - goto err_out; + return -EINVAL; + } + v5param->einjv2_struct.component_arr_count = nr_components; + for (int i = 0; i < nr_components; i++) { + v5param->einjv2_struct.component_arr[i].comp_id = + syndrome_data[i].comp_id; + v5param->einjv2_struct.component_arr[i].comp_synd = + syndrome_data[i].comp_synd; } - /* clear buffer after user input for next injection */ - memset(user_input, 0, COMP_ARR_SIZE); } else { v5param->apicid = param3; v5param->pcie_sbdf = param4; @@ -742,9 +646,6 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, rc = apei_exec_run_optional(&ctx, ACPI_EINJ_END_OPERATION); return rc; -err_out: - memset(user_input, 0, COMP_ARR_SIZE); - return -EINVAL; } /* Inject the specified hardware error */ @@ -984,6 +885,97 @@ static int einj_check_table(struct acpi_table_einj *einj_tab) return 0; } +static ssize_t u128_read(struct file *f, char __user *buf, size_t count, loff_t *off) +{ + char output[2 * COMPONENT_LEN + 1]; + u8 *data = f->f_inode->i_private; + ssize_t n; + int i; + + if (*off >= sizeof(output)) + return 0; + + for (i = 0; i < COMPONENT_LEN; i++) + sprintf(output + 2 * i, "%.02x", data[i]); + output[2 * COMPONENT_LEN] = '\n'; + + n = min(count, sizeof(output) - *off); + if (copy_to_user(buf, output + *off, n)) + return -EFAULT; + *off += n; + + return n; +} + +static ssize_t u128_write(struct file *f, const char __user *buf, size_t count, loff_t *off) +{ + char input[2 + 2 * COMPONENT_LEN + 2]; + u8 *save = f->f_inode->i_private; + u8 tmp[COMPONENT_LEN]; + char byte[3] = {}; + char *s, *e; + long val; + int i; + + if (count > sizeof(input)) + return -EINVAL; + if (copy_from_user(input, buf, count)) + return -EFAULT; + + if (input[0] == '0' && (input[1] == 'x' || input[1] == 'X')) + s = input + 2; + else + s = input; + e = input + count; + if (e[-1] == '\n') + e--; + + for (i = 0; i < COMPONENT_LEN; i++) { + byte[1] = *--e; + byte[0] = e > s ? *--e : '0'; + if (kstrtol(byte, 16, &val)) + return -EINVAL; + tmp[i] = val; + if (e <= s) + break; + } + while (++i < COMPONENT_LEN) + tmp[i] = 0; + + memcpy(save, tmp, COMPONENT_LEN); + + return count; +} + +static const struct file_operations u128_fops = { + .read = u128_read, + .write = u128_write, +}; + +static bool setup_einjv2_component_files(void) +{ + char name[32]; + + max_nr_components = 4; //FAKE!!! + debugfs_create_u32("max_nr_components", 0400, einj_debug_dir, &max_nr_components); + debugfs_create_u32("nr_components", 0600, einj_debug_dir, &nr_components); + + syndrome_data = kcalloc(max_nr_components, sizeof(syndrome_data[0]), GFP_KERNEL); + if (!syndrome_data) + return false; + + for (int i = 0; i < max_nr_components; i++) { + sprintf(name, "component_id%d", i); + debugfs_create_file(name, 0600, einj_debug_dir, + &syndrome_data[i].comp_id, &u128_fops); + sprintf(name, "component_syndrome%d", i); + debugfs_create_file(name, 0600, einj_debug_dir, + &syndrome_data[i].comp_synd, &u128_fops); + } + + return true; +} + static int __init einj_probe(struct faux_device *fdev) { int rc; @@ -1056,21 +1048,10 @@ static int __init einj_probe(struct faux_device *fdev) debugfs_create_x32("notrigger", S_IRUSR | S_IWUSR, einj_debug_dir, ¬rigger); if (available_error_type & ACPI65_EINJV2_SUPP) { - user_input = kzalloc(COMP_ARR_SIZE, GFP_KERNEL); - if (!user_input) { - EINJv2_enabled = false; - pr_info("EINJv2 is disabled: not enough memory\n"); - goto skip_EINJv2; - } - EINJv2_enabled = true; - einjv2_component_arr.data = user_input; - einjv2_component_arr.size = COMP_ARR_SIZE; - debugfs_create_blob("einjv2_component_array", S_IRUSR | S_IWUSR, - einj_debug_dir, &einjv2_component_arr); + EINJv2_enabled = setup_einjv2_component_files(); } } -skip_EINJv2: if (vendor_dev[0]) { vendor_blob.data = vendor_dev; vendor_blob.size = strlen(vendor_dev); @@ -1112,7 +1093,7 @@ static void __exit einj_remove(struct faux_device *fdev) if (vendor_errors.size) acpi_os_unmap_memory(vendor_errors.data, vendor_errors.size); } - kfree(user_input); + kfree(syndrome_data); einj_exec_ctx_init(&ctx); apei_exec_post_unmap_gars(&ctx); apei_resources_release(&einj_resources);
On Fri, May 30, 2025 at 01:52:41PM -0700, Luck, Tony wrote: > On Fri, May 30, 2025 at 12:27:11PM +0200, Borislav Petkov wrote: > > On Tue, May 06, 2025 at 02:38:13PM -0700, Zaid Alali wrote: > There isn't a fancy "debugfs_create_x128_le()" helper to manage these > 128-bit little endian numbers. So I've coded with the basic building > blocks (though using copy_from_user() and copy_to_user() feels like > back in the stone age). If there some helpers that I missed I'd be > happy to see that part simplified. simple_read_from_buffer() and simple_write_to_buffer() may be the helpers that I didn't spot earlier. -Tony
On Fri, May 30, 2025 at 01:52:39PM -0700, Luck, Tony wrote: > Use case to inject to one device would be: > > # echo 1 > nr_components What does that do? The interface probably needs a little refinement, but... > # echo 4 > component_id0 > # echo A5A5A5A5 > component_syndrome0 > ... set other files and finish with usual > # echo 1 > error_inject > > There isn't a fancy "debugfs_create_x128_le()" helper to manage these > 128-bit little endian numbers. So I've coded with the basic building > blocks (though using copy_from_user() and copy_to_user() feels like > back in the stone age). If there some helpers that I missed I'd be > happy to see that part simplified. > > Patch is on top of the existing v7 set. Obviously it needs to be folded > back into the earlier patches to make a clean history that doesn't add > functions and then replace them with different code. ... yes, definitely much better. Thanks!
On Sat, May 31, 2025 at 11:20:50AM +0200, Borislav Petkov wrote: > On Fri, May 30, 2025 at 01:52:39PM -0700, Luck, Tony wrote: > > Use case to inject to one device would be: > > > > # echo 1 > nr_components > > What does that do? EINJ V2 allows the user to perform multiple injections together. The component_idN/component_syndromeN pairs of files direct the "where" and the "what" of each injection. But the kernel needs to know how many of these pairs to use for an injection (to fill in a field in the structure passed to the BIOS). With EINJ V2 the user might want to inject to 2 locations with one injection, and then just to 1 location on the next. Zaid Alali's version took the approach of zeroing the input after each injection so the user had to start from scratch for each injection. I wasn't fond of that because the existing Linux EINJ interface saves all the paramters allowing the user to repeat the same injection by just runniing "echo 1 > error_inject: over and over (e.g. to force a soft offline by injecting multiple corrected errors to the same address). User interface options: 1) User can zero out the component_idN/component_syndromeN pairs that they don't need and have the kernel count how many injections are requested by looping to find the zero terminator. 2) Kernel could zero all pairs after an injection to make the user explicitly set the list of targets each time. 3) User provides the count vis the nr_components file (perhaps needs a better name?) 4) Something else? > > The interface probably needs a little refinement, but... > > > # echo 4 > component_id0 > > # echo A5A5A5A5 > component_syndrome0 > > ... set other files and finish with usual > > # echo 1 > error_inject > > > > There isn't a fancy "debugfs_create_x128_le()" helper to manage these > > 128-bit little endian numbers. So I've coded with the basic building > > blocks (though using copy_from_user() and copy_to_user() feels like > > back in the stone age). If there some helpers that I missed I'd be > > happy to see that part simplified. > > > > Patch is on top of the existing v7 set. Obviously it needs to be folded > > back into the earlier patches to make a clean history that doesn't add > > functions and then replace them with different code. > > ... yes, definitely much better. > > Thanks! You are welcome. > > -- > Regards/Gruss, > Boris. -Tony
Some questions inline... On Sat, May 31, 2025 at 03:24:14PM -0700, Luck, Tony wrote: > EINJ V2 allows the user to perform multiple injections together. > > The component_idN/component_syndromeN pairs of files direct the > "where" and the "what" of each injection. > > But the kernel needs to know how many of these pairs to use > for an injection (to fill in a field in the structure passed > to the BIOS). The kernel could realloc on each write. Or we could allocate the struct to max elems and trim it before passing it down to BIOS. > With EINJ V2 the user might want to inject to 2 locations with > one injection, and then just to 1 location on the next. Right. > Zaid Alali's version took the approach of zeroing the input > after each injection so the user had to start from scratch > for each injection. > > I wasn't fond of that because the existing Linux EINJ interface > saves all the paramters allowing the user to repeat the same > injection by just runniing "echo 1 > error_inject: over and over > (e.g. to force a soft offline by injecting multiple corrected > errors to the same address). I agree with you here. Linux sysfs, etc interfaces do keep their values usually. > User interface options: > > 1) User can zero out the component_idN/component_syndromeN pairs > that they don't need and have the kernel count how many injections > are requested by looping to find the zero terminator. > > 2) Kernel could zero all pairs after an injection to make the user > explicitly set the list of targets each time. > > 3) User provides the count vis the nr_components file (perhaps > needs a better name?) Yap, agree that the name is not optimal. > 4) Something else? See above. User can inject into each component pairs file and the kernel can put that in the tracking struct. So you have: # echo 4 > component_id0 # echo A5A5A5A5 > component_syndrome0 ... set other files and finish with usual # echo 1 > error_inject <--- here, it goes through each component pair and builds the structure to pass down the BIOS. And you track valid component pairs by setting the IDs to -1 or something else invalid. All those component IDs which have remained invalid after the error_inject write happens, get ignored - you gather only those which are valid and inject. And this way you can keep the old values too and gather them again and inject again, over and over again. Right? Thx.
On Sun, Jun 01, 2025 at 12:25:54PM +0200, Borislav Petkov wrote: > Some questions inline... > > On Sat, May 31, 2025 at 03:24:14PM -0700, Luck, Tony wrote: > > EINJ V2 allows the user to perform multiple injections together. > > > > The component_idN/component_syndromeN pairs of files direct the > > "where" and the "what" of each injection. > > > > But the kernel needs to know how many of these pairs to use > > for an injection (to fill in a field in the structure passed > > to the BIOS). > > The kernel could realloc on each write. Or we could allocate the struct to max > elems and trim it before passing it down to BIOS. The actual structure passed to BIOS is the same each time. Just the set_error_type_with_address::einjv2_struct::component_arr_count changed to indicate how many errors to inject. In theory the driver could allocate and copy a correctly sized structure, but Zaid's code here is simpler, an this is hardly a critical path. > > User interface options: > > > > 1) User can zero out the component_idN/component_syndromeN pairs > > that they don't need and have the kernel count how many injections > > are requested by looping to find the zero terminator. > > > > 2) Kernel could zero all pairs after an injection to make the user > > explicitly set the list of targets each time. > > > > 3) User provides the count vis the nr_components file (perhaps > > needs a better name?) > > Yap, agree that the name is not optimal. It can be dropped if we make the user zap previously supplied component_idN/component_syndromeN pairs that are no longer wanted. > > User can inject into each component pairs file and the kernel can put that in > the tracking struct. So you have: > > # echo 4 > component_id0 > # echo A5A5A5A5 > component_syndrome0 > ... set other files and finish with usual > # echo 1 > error_inject > > <--- here, it goes through each component pair and builds the structure to > pass down the BIOS. > > And you track valid component pairs by setting the IDs to -1 or something else > invalid. This is just an improvement on my "option 1" (improved because all-ones for the component ID is going to be invalid for sure, while all zeroes could be a valid component). > > All those component IDs which have remained invalid after the error_inject > write happens, get ignored - you gather only those which are valid and inject. Or just stop collecting on the first invalid one. > And this way you can keep the old values too and gather them again and inject > again, over and over again. > > Right? Yup. -Tony
On Mon, Jun 02, 2025 at 10:02:15AM -0700, Luck, Tony wrote: > On Sun, Jun 01, 2025 at 12:25:54PM +0200, Borislav Petkov wrote: > > Some questions inline... > > > > On Sat, May 31, 2025 at 03:24:14PM -0700, Luck, Tony wrote: > > > EINJ V2 allows the user to perform multiple injections together. > > > > > > The component_idN/component_syndromeN pairs of files direct the > > > "where" and the "what" of each injection. > > > > > > But the kernel needs to know how many of these pairs to use > > > for an injection (to fill in a field in the structure passed > > > to the BIOS). > > > > The kernel could realloc on each write. Or we could allocate the struct to max > > elems and trim it before passing it down to BIOS. > > The actual structure passed to BIOS is the same each time. Just the > set_error_type_with_address::einjv2_struct::component_arr_count > changed to indicate how many errors to inject. In theory the > driver could allocate and copy a correctly sized structure, but > Zaid's code here is simpler, an this is hardly a critical path. > > > > User interface options: > > > > > > 1) User can zero out the component_idN/component_syndromeN pairs > > > that they don't need and have the kernel count how many injections > > > are requested by looping to find the zero terminator. > > > > > > 2) Kernel could zero all pairs after an injection to make the user > > > explicitly set the list of targets each time. > > > > > > 3) User provides the count vis the nr_components file (perhaps > > > needs a better name?) > > > > Yap, agree that the name is not optimal. > > It can be dropped if we make the user zap previously supplied > component_idN/component_syndromeN pairs that are no longer > wanted. > > > > User can inject into each component pairs file and the kernel can put that in > > the tracking struct. So you have: > > > > # echo 4 > component_id0 > > # echo A5A5A5A5 > component_syndrome0 > > ... set other files and finish with usual > > # echo 1 > error_inject > > > > <--- here, it goes through each component pair and builds the structure to > > pass down the BIOS. > > > > And you track valid component pairs by setting the IDs to -1 or something else > > invalid. > > This is just an improvement on my "option 1" (improved because all-ones > for the component ID is going to be invalid for sure, while all zeroes > could be a valid component). > > > > All those component IDs which have remained invalid after the error_inject > > write happens, get ignored - you gather only those which are valid and inject. > > Or just stop collecting on the first invalid one. > > > And this way you can keep the old values too and gather them again and inject > > again, over and over again. > > > > Right? > > Yup. > > -Tony Thank you Tony and Borislav, this is great feedback. I will update the patches and send out a new revision that uses Tony's UI for the ID and syndrome. -Zaid
On Mon, Jun 02, 2025 at 10:02:15AM -0700, Luck, Tony wrote: > The actual structure passed to BIOS is the same each time. Just the > set_error_type_with_address::einjv2_struct::component_arr_count > changed to indicate how many errors to inject. In theory the > driver could allocate and copy a correctly sized structure, but > Zaid's code here is simpler, an this is hardly a critical path. Right, allocate it once on driver init and keep massaging it on every injection. Simple. > This is just an improvement on my "option 1" (improved because all-ones > for the component ID is going to be invalid for sure, while all zeroes > could be a valid component). Right, you need to know at injection time which of the components are valid and which are not. > Or just stop collecting on the first invalid one. That would mean that you punish the user at the first typo. :-P Considering how complex those interfaces become perhaps not such a good idea... Thx.
diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst index c52b9da08fa9..edf3a2165e75 100644 --- a/Documentation/firmware-guide/acpi/apei/einj.rst +++ b/Documentation/firmware-guide/acpi/apei/einj.rst @@ -59,6 +59,9 @@ The following files belong to it: 0x00000200 Platform Correctable 0x00000400 Platform Uncorrectable non-fatal 0x00000800 Platform Uncorrectable fatal + V2_0x00000001 EINJV2 Processor Error + V2_0x00000002 EINJV2 Memory Error + V2_0x00000004 EINJV2 PCI Express Error ================ =================================== The format of the file contents are as above, except present are only @@ -88,6 +91,8 @@ The following files belong to it: Memory address and mask valid (param1 and param2). Bit 2 PCIe (seg,bus,dev,fn) valid (see param4 below). + Bit 3 + EINJv2 extension structure is valid If set to zero, legacy behavior is mimicked where the type of injection specifies just one bit set, and param1 is multiplexed. @@ -122,6 +127,13 @@ The following files belong to it: this actually works depends on what operations the BIOS actually includes in the trigger phase. +- einjv2_component_array + + The contents of this file are used to set the "Component Array" field + of the EINJv2 Extension Structure. The expected format is hex values + for component id and syndrome separated by space, and multiple + components are separated by new line. + CXL error types are supported from ACPI 6.5 onwards (given a CXL port is present). The EINJ user interface for CXL error types is at <debugfs mount point>/cxl. The following files belong to it: @@ -194,6 +206,26 @@ An error injection example:: # echo 0x8 > error_type # Choose correctable memory error # echo 1 > error_inject # Inject now +An EINJv2 error injection example:: + + # cd /sys/kernel/debug/apei/einj + # cat available_error_type # See which errors can be injected + 0x00000002 Processor Uncorrectable non-fatal + 0x00000008 Memory Correctable + 0x00000010 Memory Uncorrectable non-fatal + V2_0x00000001 EINJV2 Processor Error + V2_0x00000002 EINJV2 Memory Error + + # echo 0x12345000 > param1 # Set memory address for injection + # echo 0xfffffffffffff000 > param2 # Range - anywhere in this page + # comp_arr="0x1 0x2 # Fill in the component array + >0x1 0x4 + >0x2 0x4" + # echo "$comp_arr" > einjv2_component_array + # echo V2_0x2 > error_type # Choose EINJv2 memory error + # echo 0xa > flags # set flags to indicate EINJv2 + # echo 1 > error_inject # Inject now + You should see something like this in dmesg:: [22715.830801] EDAC sbridge MC3: HANDLING MCE MEMORY ERROR