diff mbox series

[v2] acpi: Fix HED module initialization order when it is built-in

Message ID 20250115123149.3324733-1-tanxiaofei@huawei.com
State New
Headers show
Series [v2] acpi: Fix HED module initialization order when it is built-in | expand

Commit Message

Xiaofei Tan Jan. 15, 2025, 12:31 p.m. UTC
When the module HED is built-in, the init order is determined by
Makefile order. That order violates expectations. Because the module
HED init is behind evged. RAS records can't be handled in the
special time window that evged has initialized while HED not.
If the number of such RAS records is more than the APEI HEST error
source number, the HEST resources could be occupied all, and then
could affect subsequent RAS error reporting.

If build HED as a module, the problem remains. To solve this problem
completely, change the ACPI_HED from tristate to bool.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
---
 drivers/acpi/Kconfig  | 2 +-
 drivers/acpi/Makefile | 8 +++++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki Jan. 15, 2025, 3:51 p.m. UTC | #1
On Wed, Jan 15, 2025 at 1:38 PM Xiaofei Tan <tanxiaofei@huawei.com> wrote:
>
> When the module HED is built-in, the init order is determined by
> Makefile order. That order violates expectations. Because the module
> HED init is behind evged. RAS records can't be handled in the
> special time window that evged has initialized while HED not.
> If the number of such RAS records is more than the APEI HEST error
> source number, the HEST resources could be occupied all, and then
> could affect subsequent RAS error reporting.
>
> If build HED as a module, the problem remains. To solve this problem
> completely, change the ACPI_HED from tristate to bool.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
> ---
>  drivers/acpi/Kconfig  | 2 +-
>  drivers/acpi/Makefile | 8 +++++++-
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index d81b55f5068c..7f10aa38269d 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -452,7 +452,7 @@ config ACPI_SBS
>           the modules will be called sbs and sbshc.
>
>  config ACPI_HED
> -       tristate "Hardware Error Device"
> +       bool "Hardware Error Device"
>         help
>           This driver supports the Hardware Error Device (PNP0C33),
>           which is used to report some hardware errors notified via
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 40208a0f5dfb..b50d1baeb71f 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -15,6 +15,13 @@ endif
>
>  obj-$(CONFIG_ACPI)             += tables.o
>
> +#
> +# The hed.o needs to be in front of evged.o to avoid the problem that
> +# RAS errors cannot be handled in the special time window of startup
> +# phase that evged has initialized while hed not.
> +#
> +obj-$(CONFIG_ACPI_HED)         += hed.o
> +

I'm not sure why you are insisting on this Makefile ordering change.

It would be much more robust to run the hed driver init at a different
initcall level than evged.

If there is a problem with this approach, it needs to be mentioned in
the changelog or in the comment above.

>  #
>  # ACPI Core Subsystem (Interpreter)
>  #
> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
>  obj-$(CONFIG_ACPI_BATTERY)     += battery.o
>  obj-$(CONFIG_ACPI_SBS)         += sbshc.o
>  obj-$(CONFIG_ACPI_SBS)         += sbs.o
> -obj-$(CONFIG_ACPI_HED)         += hed.o
>  obj-$(CONFIG_ACPI_EC_DEBUGFS)  += ec_sys.o
>  obj-$(CONFIG_ACPI_BGRT)                += bgrt.o
>  obj-$(CONFIG_ACPI_CPPC_LIB)    += cppc_acpi.o
> --
Xiaofei Tan Jan. 16, 2025, 2:36 p.m. UTC | #2
在 2025/1/15 23:51, Rafael J. Wysocki 写道:
> On Wed, Jan 15, 2025 at 1:38 PM Xiaofei Tan <tanxiaofei@huawei.com> wrote:
>> When the module HED is built-in, the init order is determined by
>> Makefile order. That order violates expectations. Because the module
>> HED init is behind evged. RAS records can't be handled in the
>> special time window that evged has initialized while HED not.
>> If the number of such RAS records is more than the APEI HEST error
>> source number, the HEST resources could be occupied all, and then
>> could affect subsequent RAS error reporting.
>>
>> If build HED as a module, the problem remains. To solve this problem
>> completely, change the ACPI_HED from tristate to bool.
>>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
>> ---
>>   drivers/acpi/Kconfig  | 2 +-
>>   drivers/acpi/Makefile | 8 +++++++-
>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
>> index d81b55f5068c..7f10aa38269d 100644
>> --- a/drivers/acpi/Kconfig
>> +++ b/drivers/acpi/Kconfig
>> @@ -452,7 +452,7 @@ config ACPI_SBS
>>            the modules will be called sbs and sbshc.
>>
>>   config ACPI_HED
>> -       tristate "Hardware Error Device"
>> +       bool "Hardware Error Device"
>>          help
>>            This driver supports the Hardware Error Device (PNP0C33),
>>            which is used to report some hardware errors notified via
>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
>> index 40208a0f5dfb..b50d1baeb71f 100644
>> --- a/drivers/acpi/Makefile
>> +++ b/drivers/acpi/Makefile
>> @@ -15,6 +15,13 @@ endif
>>
>>   obj-$(CONFIG_ACPI)             += tables.o
>>
>> +#
>> +# The hed.o needs to be in front of evged.o to avoid the problem that
>> +# RAS errors cannot be handled in the special time window of startup
>> +# phase that evged has initialized while hed not.
>> +#
>> +obj-$(CONFIG_ACPI_HED)         += hed.o
>> +
> I'm not sure why you are insisting on this Makefile ordering change.
>
> It would be much more robust to run the hed driver init at a different
> initcall level than evged.
>
> If there is a problem with this approach, it needs to be mentioned in
> the changelog or in the comment above.

Hi Rafael,

The approach of changing the initcall level can work too.  Will send v3 patch later, thanks.


>>   #
>>   # ACPI Core Subsystem (Interpreter)
>>   #
>> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
>>   obj-$(CONFIG_ACPI_BATTERY)     += battery.o
>>   obj-$(CONFIG_ACPI_SBS)         += sbshc.o
>>   obj-$(CONFIG_ACPI_SBS)         += sbs.o
>> -obj-$(CONFIG_ACPI_HED)         += hed.o
>>   obj-$(CONFIG_ACPI_EC_DEBUGFS)  += ec_sys.o
>>   obj-$(CONFIG_ACPI_BGRT)                += bgrt.o
>>   obj-$(CONFIG_ACPI_CPPC_LIB)    += cppc_acpi.o
>> --
> .
diff mbox series

Patch

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index d81b55f5068c..7f10aa38269d 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -452,7 +452,7 @@  config ACPI_SBS
 	  the modules will be called sbs and sbshc.
 
 config ACPI_HED
-	tristate "Hardware Error Device"
+	bool "Hardware Error Device"
 	help
 	  This driver supports the Hardware Error Device (PNP0C33),
 	  which is used to report some hardware errors notified via
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 40208a0f5dfb..b50d1baeb71f 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -15,6 +15,13 @@  endif
 
 obj-$(CONFIG_ACPI)		+= tables.o
 
+#
+# The hed.o needs to be in front of evged.o to avoid the problem that
+# RAS errors cannot be handled in the special time window of startup
+# phase that evged has initialized while hed not.
+#
+obj-$(CONFIG_ACPI_HED)		+= hed.o
+
 #
 # ACPI Core Subsystem (Interpreter)
 #
@@ -95,7 +102,6 @@  obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
 obj-$(CONFIG_ACPI_BATTERY)	+= battery.o
 obj-$(CONFIG_ACPI_SBS)		+= sbshc.o
 obj-$(CONFIG_ACPI_SBS)		+= sbs.o
-obj-$(CONFIG_ACPI_HED)		+= hed.o
 obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
 obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o