diff mbox

[RFC,0/2] efi/memattr: Fix memory corruption and warning issues

Message ID 20250108215957.3437660-1-usamaarif642@gmail.com
State New
Headers show

Commit Message

Usama Arif Jan. 8, 2025, 9:53 p.m. UTC
Since the patch with the warning in [1] was merged, a very significant
number of kexec boots are producing the warning in our (Meta) fleet.

I believe there are 2 problems, the warning itself might not be
triggered on the right condition, and memory attributes
table is getting corrupted.
An example of the warning when its not triggered correctly and
is fixed by patch 1:
efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 2, desc_size == 48, num_entries == 48)
An example of the warning when memory attributes table
is getting corrupted and might possibly be fixed by patch 2:
efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 1, desc_size == 2072184435, num_entries == 3248688968)
Its clear that the desc size and num_entries are wrong.

The logic behind patch 1 is explained in its commit message.

The memory corruption is looking very similar to the problem that was
fixed by 77d48d39e99170 ("efistub/tpm: Use ACPI reclaim memory for
event log to avoid corruption"), but this time with memattr table,
where it might not be preserved during kexec. I have
not been able to reproduce this in the test machine I have
over the past couple of days (hence marked as RFC) , but its
happening often in our prod.
When this area is not reserved, it comes up as usable in
/sys/firmware/memmap. This means that kexec, which uses that memmap
to find usable memory regions, can select the region where
efi_mem_attr_table is and overwrite it and relocate_kernel.

Having a fix in firmware can be difficult to get through.
The next ideal place would be in libstub. However, it looks like
InstallMemoryAttributesTable [2] is not available as a boot service
call option [3], [4], I tried to use install_configuration_table as
a substitute, but its not valid and corrupts the MemoryAttributesTable.
The prints I got from the below code in coverletter were:
EFI stub: ERROR: KKK tbl 5f19e018 tbl_>version=1, tbl->num_entries 48 tbl->desc_size 48
EFI stub: ERROR: KKK2 tbl 67184018 tbl_>version=2048, tbl->num_entries 0 tbl->desc_size 0
which shows the table got corrupted. This can bee seen in the kernel boot as well after
(with the version showing up as 2048).

As a last option for a fix, the patch marks that region as reserved in
e820_table_firmware if it is currently E820_TYPE_RAM so that kexec doesn't
use it for kernel segments.

[1] https://lore.kernel.org/all/20241031175822.2952471-2-ardb+git@google.com/
[2] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Core/Dxe/Misc/MemoryAttributesTable.c#L100
[3] https://github.com/tianocore/edk2/blob/42a141800c0c26a09d2344e84a89ce4097a263ae/MdeModulePkg/Core/Dxe/DxeMain/DxeMain.c#L41
[4] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/firmware/efi/libstub/efistub.h#L327


 
Usama Arif (2):
  efi/memattr: Use desc_size instead of total size to check for
    corruption
  efi/memattr: add efi_mem_attr_table as a reserved region in
    820_table_firmware

 arch/x86/include/asm/e820/api.h |  2 ++
 arch/x86/kernel/e820.c          |  6 ++++++
 arch/x86/platform/efi/efi.c     |  9 +++++++++
 drivers/firmware/efi/memattr.c  | 17 +++++++----------
 include/linux/efi.h             |  7 +++++++
 5 files changed, 31 insertions(+), 10 deletions(-)

Comments

Ard Biesheuvel Jan. 9, 2025, 3:45 p.m. UTC | #1
On Wed, 8 Jan 2025 at 23:00, Usama Arif <usamaarif642@gmail.com> wrote:
>
> The commit in [1] introduced a check to see if EFI memory attributes
> table was corrupted. It assumed that efi.memmap.nr_map remains
> constant, but it changes during late boot.
> Hence, the check is valid during cold boot, but not in the subsequent
> kexec boot.
>
> This is best explained with an exampled. At cold boot, for a test
> machine:
> efi.memmap.nr_map=91,
> memory_attributes_table->num_entries=48,
> desc_size = 48
> Hence, the check introduced in [1] where 3x the size of the
> entire EFI memory map is a reasonable upper bound for the size of this
> table is valid.
>
> In late boot __efi_enter_virtual_mode calls 2 functions that updates
> efi.memmap.nr_map:
> - efi_map_regions which reduces the `count` of map entries
>   (for e.g. if should_map_region returns false) and which is reflected
>   in efi.memmap by __efi_memmap_init.
>   At this point efi.memmap.nr_map becomes 46 in the test machine.
> - efi_free_boot_services which also reduces the number of memory regions
>   available (for e.g. if md->type or md->attribute is not the right value).
>   At this point efi.memmap.nr_map becomes 9 in the test machine.
> Hence when you kexec into a new kernel and pass efi.memmap, the
> paramaters that are compared are:
> efi.memmap.nr_map=9,
> memory_attributes_table->num_entries=48,
> desc_size = 48
> where the check in [1] is no longer valid with such a low efi.memmap.nr_map
> as it was reduced due to efi_map_regions and efi_free_boot_services.
>
> A more appropriate check is to see if the description size reported by
> efi and memory attributes table is the same.
>
> [1] https://lore.kernel.org/all/20241031175822.2952471-2-ardb+git@google.com/
>
> Fixes: 8fbe4c49c0cc ("efi/memattr: Ignore table if the size is clearly bogus")
> Reported-by: Breno Leitao <leitao@debian.org>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> ---
>  drivers/firmware/efi/memattr.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
>

The more I think about this, the more I feel that kexec on x86 should
simply discard this table, and run with the firmware code RWX (which
is not the end of the world).

The main reason is that the EFI memory map and the EFI memory
attributes table are supposed to be a matched pair, where each RTcode
entry in the former is broken down into multiple code and data
segments in the latter. The amount of mangling that the x86 arch code
does of the EFI memory map makes it intractable to ensure that they
remain in sync, and so it is better not to bother.


> diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c
> index c38b1a335590..d3bc161361fb 100644
> --- a/drivers/firmware/efi/memattr.c
> +++ b/drivers/firmware/efi/memattr.c
> @@ -40,21 +40,17 @@ int __init efi_memattr_init(void)
>                 goto unmap;
>         }
>
> -
>         /*
> -        * Sanity check: the Memory Attributes Table contains up to 3 entries
> -        * for each entry of type EfiRuntimeServicesCode in the EFI memory map.
> -        * So if the size of the table exceeds 3x the size of the entire EFI
> -        * memory map, there is clearly something wrong, and the table should
> -        * just be ignored altogether.
> +        * Sanity check: the Memory Attributes Table desc_size and
> +        * efi.memmap.desc_size should match.
>          */
> -       size = tbl->num_entries * tbl->desc_size;
> -       if (size > 3 * efi.memmap.nr_map * efi.memmap.desc_size) {
> -               pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u)\n",
> -                       tbl->version, tbl->desc_size, tbl->num_entries);
> +       if (efi.memmap.desc_size != tbl->desc_size) {
> +               pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, table desc_size == %u, efi.memmap.desc_size == %lu, table num_entries == %u)\n",
> +                       tbl->version, tbl->desc_size, efi.memmap.desc_size, tbl->num_entries);
>                 goto unmap;
>         }
>
> +       size = tbl->num_entries * tbl->desc_size;
>         tbl_size = sizeof(*tbl) + size;
>         memblock_reserve(efi_mem_attr_table, tbl_size);
>         set_bit(EFI_MEM_ATTR, &efi.flags);
> --
> 2.43.5
>
Ard Biesheuvel Jan. 10, 2025, 7:21 a.m. UTC | #2
On Thu, 9 Jan 2025 at 17:36, Usama Arif <usamaarif642@gmail.com> wrote:
>
>
>
> On 09/01/2025 15:45, Ard Biesheuvel wrote:
> > On Wed, 8 Jan 2025 at 23:00, Usama Arif <usamaarif642@gmail.com> wrote:
> >>
> >> The commit in [1] introduced a check to see if EFI memory attributes
> >> table was corrupted. It assumed that efi.memmap.nr_map remains
> >> constant, but it changes during late boot.
> >> Hence, the check is valid during cold boot, but not in the subsequent
> >> kexec boot.
> >>
> >> This is best explained with an exampled. At cold boot, for a test
> >> machine:
> >> efi.memmap.nr_map=91,
> >> memory_attributes_table->num_entries=48,
> >> desc_size = 48
> >> Hence, the check introduced in [1] where 3x the size of the
> >> entire EFI memory map is a reasonable upper bound for the size of this
> >> table is valid.
> >>
> >> In late boot __efi_enter_virtual_mode calls 2 functions that updates
> >> efi.memmap.nr_map:
> >> - efi_map_regions which reduces the `count` of map entries
> >>   (for e.g. if should_map_region returns false) and which is reflected
> >>   in efi.memmap by __efi_memmap_init.
> >>   At this point efi.memmap.nr_map becomes 46 in the test machine.
> >> - efi_free_boot_services which also reduces the number of memory regions
> >>   available (for e.g. if md->type or md->attribute is not the right value).
> >>   At this point efi.memmap.nr_map becomes 9 in the test machine.
> >> Hence when you kexec into a new kernel and pass efi.memmap, the
> >> paramaters that are compared are:
> >> efi.memmap.nr_map=9,
> >> memory_attributes_table->num_entries=48,
> >> desc_size = 48
> >> where the check in [1] is no longer valid with such a low efi.memmap.nr_map
> >> as it was reduced due to efi_map_regions and efi_free_boot_services.
> >>
> >> A more appropriate check is to see if the description size reported by
> >> efi and memory attributes table is the same.
> >>
> >> [1] https://lore.kernel.org/all/20241031175822.2952471-2-ardb+git@google.com/
> >>
> >> Fixes: 8fbe4c49c0cc ("efi/memattr: Ignore table if the size is clearly bogus")
> >> Reported-by: Breno Leitao <leitao@debian.org>
> >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> >> ---
> >>  drivers/firmware/efi/memattr.c | 16 ++++++----------
> >>  1 file changed, 6 insertions(+), 10 deletions(-)
> >>
> >
> > The more I think about this, the more I feel that kexec on x86 should
> > simply discard this table, and run with the firmware code RWX (which
> > is not the end of the world).
>
>
> By discard this table, do you mean kexec not use e820_table_firmware?

No, I mean kexec ignores the memory attributes table.

> Also a very basic question, what do you mean by run with the firmware RWX?
>

The memory attributes table is an overlay for the EFI memory map that
describes which runtime code regions may be mapped with restricted
permissions. Without this table, everything will be mapped writable as
well as executable, but only in the EFI page tables, which are only
active when an EFI call is in progress.

> Sorry for the very basic questions above!
>

No worries.
Usama Arif Jan. 10, 2025, 10:53 a.m. UTC | #3
On 10/01/2025 07:21, Ard Biesheuvel wrote:
> On Thu, 9 Jan 2025 at 17:36, Usama Arif <usamaarif642@gmail.com> wrote:
>>
>>
>>
>> On 09/01/2025 15:45, Ard Biesheuvel wrote:
>>> On Wed, 8 Jan 2025 at 23:00, Usama Arif <usamaarif642@gmail.com> wrote:
>>>>
>>>> The commit in [1] introduced a check to see if EFI memory attributes
>>>> table was corrupted. It assumed that efi.memmap.nr_map remains
>>>> constant, but it changes during late boot.
>>>> Hence, the check is valid during cold boot, but not in the subsequent
>>>> kexec boot.
>>>>
>>>> This is best explained with an exampled. At cold boot, for a test
>>>> machine:
>>>> efi.memmap.nr_map=91,
>>>> memory_attributes_table->num_entries=48,
>>>> desc_size = 48
>>>> Hence, the check introduced in [1] where 3x the size of the
>>>> entire EFI memory map is a reasonable upper bound for the size of this
>>>> table is valid.
>>>>
>>>> In late boot __efi_enter_virtual_mode calls 2 functions that updates
>>>> efi.memmap.nr_map:
>>>> - efi_map_regions which reduces the `count` of map entries
>>>>   (for e.g. if should_map_region returns false) and which is reflected
>>>>   in efi.memmap by __efi_memmap_init.
>>>>   At this point efi.memmap.nr_map becomes 46 in the test machine.
>>>> - efi_free_boot_services which also reduces the number of memory regions
>>>>   available (for e.g. if md->type or md->attribute is not the right value).
>>>>   At this point efi.memmap.nr_map becomes 9 in the test machine.
>>>> Hence when you kexec into a new kernel and pass efi.memmap, the
>>>> paramaters that are compared are:
>>>> efi.memmap.nr_map=9,
>>>> memory_attributes_table->num_entries=48,
>>>> desc_size = 48
>>>> where the check in [1] is no longer valid with such a low efi.memmap.nr_map
>>>> as it was reduced due to efi_map_regions and efi_free_boot_services.
>>>>
>>>> A more appropriate check is to see if the description size reported by
>>>> efi and memory attributes table is the same.
>>>>
>>>> [1] https://lore.kernel.org/all/20241031175822.2952471-2-ardb+git@google.com/
>>>>
>>>> Fixes: 8fbe4c49c0cc ("efi/memattr: Ignore table if the size is clearly bogus")
>>>> Reported-by: Breno Leitao <leitao@debian.org>
>>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>>>> ---
>>>>  drivers/firmware/efi/memattr.c | 16 ++++++----------
>>>>  1 file changed, 6 insertions(+), 10 deletions(-)
>>>>
>>>
>>> The more I think about this, the more I feel that kexec on x86 should
>>> simply discard this table, and run with the firmware code RWX (which
>>> is not the end of the world).
>>
>>
>> By discard this table, do you mean kexec not use e820_table_firmware?
> 
> No, I mean kexec ignores the memory attributes table.
> 
>> Also a very basic question, what do you mean by run with the firmware RWX?
>>
> 
> The memory attributes table is an overlay for the EFI memory map that
> describes which runtime code regions may be mapped with restricted
> permissions. Without this table, everything will be mapped writable as
> well as executable, but only in the EFI page tables, which are only
> active when an EFI call is in progress.
> 

Thanks for explaining!

So basically get rid of memattr.c :)

Do you mean get rid of it only for kexec, or not do it for any
boot (including cold boot)?
I do like this idea! I couldn't find this in the git history,
but do you know if this was added in the linux kernel just
because EFI spec added support for it, or if there was a
specific security problem?

Thanks!
Ard Biesheuvel Jan. 10, 2025, 5:25 p.m. UTC | #4
On Fri, 10 Jan 2025 at 11:53, Usama Arif <usamaarif642@gmail.com> wrote:
>
>
>
> On 10/01/2025 07:21, Ard Biesheuvel wrote:
> > On Thu, 9 Jan 2025 at 17:36, Usama Arif <usamaarif642@gmail.com> wrote:
> >>
> >>
> >>
> >> On 09/01/2025 15:45, Ard Biesheuvel wrote:
> >>> On Wed, 8 Jan 2025 at 23:00, Usama Arif <usamaarif642@gmail.com> wrote:
> >>>>
> >>>> The commit in [1] introduced a check to see if EFI memory attributes
> >>>> table was corrupted. It assumed that efi.memmap.nr_map remains
> >>>> constant, but it changes during late boot.
> >>>> Hence, the check is valid during cold boot, but not in the subsequent
> >>>> kexec boot.
> >>>>
> >>>> This is best explained with an exampled. At cold boot, for a test
> >>>> machine:
> >>>> efi.memmap.nr_map=91,
> >>>> memory_attributes_table->num_entries=48,
> >>>> desc_size = 48
> >>>> Hence, the check introduced in [1] where 3x the size of the
> >>>> entire EFI memory map is a reasonable upper bound for the size of this
> >>>> table is valid.
> >>>>
> >>>> In late boot __efi_enter_virtual_mode calls 2 functions that updates
> >>>> efi.memmap.nr_map:
> >>>> - efi_map_regions which reduces the `count` of map entries
> >>>>   (for e.g. if should_map_region returns false) and which is reflected
> >>>>   in efi.memmap by __efi_memmap_init.
> >>>>   At this point efi.memmap.nr_map becomes 46 in the test machine.
> >>>> - efi_free_boot_services which also reduces the number of memory regions
> >>>>   available (for e.g. if md->type or md->attribute is not the right value).
> >>>>   At this point efi.memmap.nr_map becomes 9 in the test machine.
> >>>> Hence when you kexec into a new kernel and pass efi.memmap, the
> >>>> paramaters that are compared are:
> >>>> efi.memmap.nr_map=9,
> >>>> memory_attributes_table->num_entries=48,
> >>>> desc_size = 48
> >>>> where the check in [1] is no longer valid with such a low efi.memmap.nr_map
> >>>> as it was reduced due to efi_map_regions and efi_free_boot_services.
> >>>>
> >>>> A more appropriate check is to see if the description size reported by
> >>>> efi and memory attributes table is the same.
> >>>>
> >>>> [1] https://lore.kernel.org/all/20241031175822.2952471-2-ardb+git@google.com/
> >>>>
> >>>> Fixes: 8fbe4c49c0cc ("efi/memattr: Ignore table if the size is clearly bogus")
> >>>> Reported-by: Breno Leitao <leitao@debian.org>
> >>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> >>>> ---
> >>>>  drivers/firmware/efi/memattr.c | 16 ++++++----------
> >>>>  1 file changed, 6 insertions(+), 10 deletions(-)
> >>>>
> >>>
> >>> The more I think about this, the more I feel that kexec on x86 should
> >>> simply discard this table, and run with the firmware code RWX (which
> >>> is not the end of the world).
> >>
> >>
> >> By discard this table, do you mean kexec not use e820_table_firmware?
> >
> > No, I mean kexec ignores the memory attributes table.
> >
> >> Also a very basic question, what do you mean by run with the firmware RWX?
> >>
> >
> > The memory attributes table is an overlay for the EFI memory map that
> > describes which runtime code regions may be mapped with restricted
> > permissions. Without this table, everything will be mapped writable as
> > well as executable, but only in the EFI page tables, which are only
> > active when an EFI call is in progress.
> >
>
> Thanks for explaining!
>
> So basically get rid of memattr.c :)
>
> Do you mean get rid of it only for kexec, or not do it for any
> boot (including cold boot)?

Only for kexec, and only on x86

> I do like this idea! I couldn't find this in the git history,
> but do you know if this was added in the linux kernel just
> because EFI spec added support for it, or if there was a
> specific security problem?
>

Mapping memory RWX is generally a bad idea, so we should avoid it if
possible. But EFI runtime memory regions are only mapped during a EFI
runtime call, and these don't happen often at all, so the benefit is
only marginal. (In the early days of EFI, it was more common for the
OS to map these regions permanently, but we stopped doing that a long
time ago)
diff mbox

Patch

diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index d33ccbc4a2c6..a1a956f2d963 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -1143,6 +1143,7 @@  efi_enable_reset_attack_mitigation(void) { }
 #endif
 
 void efi_retrieve_eventlog(void);
+void efi_mem_attr_init(void);
 
 struct screen_info *alloc_screen_info(void);
 struct screen_info *__alloc_screen_info(void);
diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
index 4f1fa302234d..c5b60aea342e 100644
--- a/drivers/firmware/efi/libstub/mem.c
+++ b/drivers/firmware/efi/libstub/mem.c
@@ -128,3 +128,35 @@  void efi_free(unsigned long size, unsigned long addr)
        nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE;
        efi_bs_call(free_pages, addr, nr_pages);
 }
+
+void efi_mem_attr_init(void)
+{
+       efi_guid_t linux_mem_attr_guid = EFI_MEMORY_ATTRIBUTES_TABLE_GUID;
+       efi_memory_attributes_table_t *tbl = NULL;
+       efi_status_t status;
+       unsigned long size;
+
+       tbl = get_efi_config_table(linux_mem_attr_guid);
+       efi_err("KKK tbl %lx tbl_>version=%d, tbl->num_entries %d tbl->desc_size %d\n", tbl, tbl->version, tbl->num_entries, tbl->desc_size);
+
+       size = tbl->num_entries * tbl->desc_size;
+       status = efi_bs_call(allocate_pool, EFI_ACPI_RECLAIM_MEMORY,
+                            sizeof(*tbl) + size, (void **)&tbl);
+
+       if (status != EFI_SUCCESS) {
+               efi_err("Unable to allocate memory for event log\n");
+               return;
+       }
+
+       status = efi_bs_call(install_configuration_table,
+                            &linux_mem_attr_guid, tbl);
+
+       if (status != EFI_SUCCESS)
+               efi_err("Unable to install configuration table to update memory type\n");
+       efi_bs_call(free_pool, tbl);
+
+       /* verify if its the same table */
+       tbl = get_efi_config_table(linux_mem_attr_guid);
+       efi_err("KKK2 tbl %lx tbl_>version=%d, tbl->num_entries %d tbl->desc_size %d\n", tbl, tbl->version, tbl->num_entries, tbl->desc_size);
+
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index f8e465da344d..c0c3d278451d 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -1036,6 +1036,8 @@  void __noreturn efi_stub_entry(efi_handle_t handle,
 
        efi_retrieve_eventlog();
 
+       efi_mem_attr_init();
+
        setup_graphics(boot_params);
 
        setup_efi_pci(boot_params);