Message ID | 20240214200709.777166-1-Benjamin.Cheatham@amd.com |
---|---|
Headers | show |
Series | cxl, EINJ: Update EINJ for CXL error types | expand |
Luck, Tony wrote: > > I would save that work for a clear description of why einj.ko should not > > be resident. > > Personally, it would save me having to type "modprobe einj" to run tests (and > answer e-mails from validation folks telling they missed this step). It would only autoload with cxl_core.ko though. > > But others might feels this is unwanted. It looks like some distros build kernels > with CONFIG_ACPI_APEI_EINJ=m. > > On the other hand, EINJ should be under control of a BIOS option that > defaults to "off". So production systems won't enable it. > > But perhaps there will be a pr_warn() or pr_err() during boot. One of these will likely trip: > > pr_warn("EINJ table not found.\n"); > pr_err("Failed to get EINJ table: %s\n", acpi_format_exception(status)); > pr_warn(FW_BUG "Invalid EINJ table.\n"); > pr_err("Error collecting EINJ resources.\n"); Oh, good point. However, should a debug module really be throwing pr_err() or pr_warn()? I.e. should those all move to pr_info() or pr_debug() since the error case is detected by the lack of debugfs files being published. At least that would be my preference over creating cxl_einj.ko.
Luck, Tony wrote: > > But perhaps there will be a pr_warn() or pr_err() during boot. One of these will likely trip: > > > pr_warn("EINJ table not found.\n"); > > pr_err("Failed to get EINJ table: %s\n", acpi_format_exception(status)); > > pr_warn(FW_BUG "Invalid EINJ table.\n"); > > pr_err("Error collecting EINJ resources.\n"); > > Just tried on my system. The winner (for me) is: > > [ 27.989081] EINJ: EINJ table not found. > > If you decide that it is OK to auto-load, I think that needs severity downgraded to pr_info(). > > Users ask questions when they see warnings. Sounds good, I missed this before sending my last reply.
On Wed, 14 Feb 2024, Ben Cheatham wrote: >Update EINJ kernel document to include how to inject CXL protocol error >types, build the kernel to include CXL error types, and give an example >injection. > >Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> Would vote for folding into 2/3, but otherwise looks good with a minor suggestion. Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> >--- > .../firmware-guide/acpi/apei/einj.rst | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > >diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst >index d6b61d22f525..f179adf7b61c 100644 >--- a/Documentation/firmware-guide/acpi/apei/einj.rst >+++ b/Documentation/firmware-guide/acpi/apei/einj.rst >@@ -181,6 +181,25 @@ You should see something like this in dmesg:: > [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0 > [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0) > >+CXL error types are supported from ACPI 6.5 onwards. These error types ^ and target a CXL Port >+are not available in the legacy interface at /sys/kernel/debug/apei/einj, >+and are instead at /sys/kernel/debug/cxl/. There is a file under debug/cxl >+called "einj_type" that is analogous to available_error_type under debug/cxl. >+There is also a "einj_inject" file in each $dport_dev directory under debug/cxl >+that will inject a given error into the dport represented by $dport_dev. >+For example, to inject a CXL.mem protocol correctable error into >+$dport_dev=pci0000:0c:: >+ >+ # cd /sys/kernel/debug/cxl/ >+ # cat einj_type # See which error can be injected >+ 0x00008000 CXL.mem Protocol Correctable >+ 0x00010000 CXL.mem Protocol Uncorrectable non-fatal >+ 0x00020000 CXL.mem Protocol Uncorrectable fatal >+ # cd 0000:e0:01.1 # Navigate to dport to inject into >+ # echo 0x8000 > einj_inject # Inject error >+ >+To use CXL error types, ``CONFIG_CXL_EINJ`` will need to be enabled. >+ > Special notes for injection into SGX enclaves: > > There may be a separate BIOS setup option to enable SGX injection. >-- >2.34.1 >
Thanks for taking a look David! On 2/20/24 1:02 PM, Davidlohr Bueso wrote: > On Wed, 14 Feb 2024, Ben Cheatham wrote: > >> Update EINJ kernel document to include how to inject CXL protocol error >> types, build the kernel to include CXL error types, and give an example >> injection. >> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> > > Would vote for folding into 2/3, but otherwise looks good with a minor > suggestion. > I would, but I think 2/3 is already pretty large and this is more digestible to me. I've also reworked a large portion of that patch for v13 so it's probably better to keep it smaller. > Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> > >> --- >> .../firmware-guide/acpi/apei/einj.rst | 19 +++++++++++++++++++ >> 1 file changed, 19 insertions(+) >> >> diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst >> index d6b61d22f525..f179adf7b61c 100644 >> --- a/Documentation/firmware-guide/acpi/apei/einj.rst >> +++ b/Documentation/firmware-guide/acpi/apei/einj.rst >> @@ -181,6 +181,25 @@ You should see something like this in dmesg:: >> [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0 >> [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0) >> >> +CXL error types are supported from ACPI 6.5 onwards. These error types > ^ and target a CXL Port > Will add. Thanks, Ben >> +are not available in the legacy interface at /sys/kernel/debug/apei/einj, >> +and are instead at /sys/kernel/debug/cxl/. There is a file under debug/cxl >> +called "einj_type" that is analogous to available_error_type under debug/cxl. >> +There is also a "einj_inject" file in each $dport_dev directory under debug/cxl >> +that will inject a given error into the dport represented by $dport_dev. >> +For example, to inject a CXL.mem protocol correctable error into >> +$dport_dev=pci0000:0c:: >> + >> + # cd /sys/kernel/debug/cxl/ >> + # cat einj_type # See which error can be injected >> + 0x00008000 CXL.mem Protocol Correctable >> + 0x00010000 CXL.mem Protocol Uncorrectable non-fatal >> + 0x00020000 CXL.mem Protocol Uncorrectable fatal >> + # cd 0000:e0:01.1 # Navigate to dport to inject into >> + # echo 0x8000 > einj_inject # Inject error >> + >> +To use CXL error types, ``CONFIG_CXL_EINJ`` will need to be enabled. >> + >> Special notes for injection into SGX enclaves: >> >> There may be a separate BIOS setup option to enable SGX injection. >> -- >> 2.34.1 >>