Message ID | 20250108215957.3437660-3-usamaarif642@gmail.com |
---|---|
State | New |
Headers | show |
Series | [RFC,1/2] efi/memattr: Use desc_size instead of total size to check for corruption | expand |
On Wed, 8 Jan 2025 at 23:00, Usama Arif <usamaarif642@gmail.com> wrote: > > When this area is not reserved, it comes up as usable in > /sys/firmware/memmap. This means that kexec, which uses that memmap > to find usable memory regions, can select the region where > efi_mem_attr_table is and overwrite it and relocate_kernel. > > Since the patch in [1] was merged, all boots after kexec > are producing the warning that it introduced. > > Having a fix in firmware can be difficult to get through. I don't follow. I don't think there is anything wrong with the firmware here. Could you elaborate? > The next ideal place would be in libstub. However, it looks like > InstallMemoryAttributesTable [2] is not available as a boot service > call option [3], [4], and install_configuration_table does not > seem to work as a valid substitute. > To do what, exactly? > As a last option for a fix, this patch marks that region as reserved in > e820_table_firmware if it is currently E820_TYPE_RAM so that kexec doesn't > use it for kernel segments. > > [1] https://lore.kernel.org/all/20241031175822.2952471-2-ardb+git@google.com/ > [2] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Core/Dxe/Misc/MemoryAttributesTable.c#L100 > [3] https://github.com/tianocore/edk2/blob/42a141800c0c26a09d2344e84a89ce4097a263ae/MdeModulePkg/Core/Dxe/DxeMain/DxeMain.c#L41 > [4] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/firmware/efi/libstub/efistub.h#L327 > > Reported-by: Breno Leitao <leitao@debian.org> > Signed-off-by: Usama Arif <usamaarif642@gmail.com> > --- > arch/x86/include/asm/e820/api.h | 2 ++ > arch/x86/kernel/e820.c | 6 ++++++ > arch/x86/platform/efi/efi.c | 9 +++++++++ > drivers/firmware/efi/memattr.c | 1 + > include/linux/efi.h | 7 +++++++ > 5 files changed, 25 insertions(+) > > diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h > index 2e74a7f0e935..4e9aa24f03bd 100644 > --- a/arch/x86/include/asm/e820/api.h > +++ b/arch/x86/include/asm/e820/api.h > @@ -16,6 +16,8 @@ extern bool e820__mapped_all(u64 start, u64 end, enum e820_type type); > > extern void e820__range_add (u64 start, u64 size, enum e820_type type); > extern u64 e820__range_update(u64 start, u64 size, enum e820_type old_type, enum e820_type new_type); > +extern u64 e820__range_update_firmware(u64 start, u64 size, enum e820_type old_type, > + enum e820_type new_type); > extern u64 e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type); > extern u64 e820__range_update_table(struct e820_table *t, u64 start, u64 size, enum e820_type old_type, enum e820_type new_type); > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index 82b96ed9890a..01d7d3c0d299 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -538,6 +538,12 @@ u64 __init e820__range_update_table(struct e820_table *t, u64 start, u64 size, > return __e820__range_update(t, start, size, old_type, new_type); > } > > +u64 __init e820__range_update_firmware(u64 start, u64 size, enum e820_type old_type, > + enum e820_type new_type) > +{ > + return __e820__range_update(e820_table_firmware, start, size, old_type, new_type); > +} > + > /* Remove a range of memory from the E820 table: */ > u64 __init e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type) > { > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c > index a7ff189421c3..13684c5d7c05 100644 > --- a/arch/x86/platform/efi/efi.c > +++ b/arch/x86/platform/efi/efi.c > @@ -168,6 +168,15 @@ static void __init do_add_efi_memmap(void) > e820__update_table(e820_table); > } > > +/* Reserve firmware area if it was marked as RAM */ > +void arch_update_firmware_area(u64 addr, u64 size) > +{ > + if (e820__get_entry_type(addr, addr + size) == E820_TYPE_RAM) { > + e820__range_update_firmware(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); > + e820__update_table(e820_table_firmware); > + } > +} > + > /* > * Given add_efi_memmap defaults to 0 and there is no alternative > * e820 mechanism for soft-reserved memory, import the full EFI memory > diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c > index d3bc161361fb..d131781e2d7b 100644 > --- a/drivers/firmware/efi/memattr.c > +++ b/drivers/firmware/efi/memattr.c > @@ -53,6 +53,7 @@ int __init efi_memattr_init(void) > size = tbl->num_entries * tbl->desc_size; > tbl_size = sizeof(*tbl) + size; > memblock_reserve(efi_mem_attr_table, tbl_size); > + arch_update_firmware_area(efi_mem_attr_table, tbl_size); > set_bit(EFI_MEM_ATTR, &efi.flags); > > unmap: > diff --git a/include/linux/efi.h b/include/linux/efi.h > index e5815867aba9..8eb9698bd6a4 100644 > --- a/include/linux/efi.h > +++ b/include/linux/efi.h > @@ -1358,4 +1358,11 @@ extern struct blocking_notifier_head efivar_ops_nh; > void efivars_generic_ops_register(void); > void efivars_generic_ops_unregister(void); > > +#ifdef CONFIG_X86_64 > +void __init arch_update_firmware_area(u64 addr, u64 size); > +#else > +static inline void __init arch_update_firmware_area(u64 addr, u64 size) > +{ > +} > +#endif > #endif /* _LINUX_EFI_H */ > -- > 2.43.5 >
On 09/01/2025 16:15, Ard Biesheuvel wrote: > On Wed, 8 Jan 2025 at 23:00, Usama Arif <usamaarif642@gmail.com> wrote: >> >> When this area is not reserved, it comes up as usable in >> /sys/firmware/memmap. This means that kexec, which uses that memmap >> to find usable memory regions, can select the region where >> efi_mem_attr_table is and overwrite it and relocate_kernel. >> >> Since the patch in [1] was merged, all boots after kexec >> are producing the warning that it introduced. >> >> Having a fix in firmware can be difficult to get through. > > I don't follow. I don't think there is anything wrong with the > firmware here. Could you elaborate? > So the problem is, kexec sees this memory as System RAM, and decides it can be used to place an image here. I guess the question is (and I actually don't know the answer here), whose responsibility is it to mark this region as reserved so that its not touched by anyone else. I would have thought it should be firmware? Maybe its not the firmwares' job to mark it as reserved, but just pass it to kernel and the kernel is supposed to make sure it gets reserved in a proper way, even across kexecs. I think in the end whoevers' responsibility it is, the easiest path forward seems to be in kernel? (and not firmware or libstub) > >> The next ideal place would be in libstub. However, it looks like >> InstallMemoryAttributesTable [2] is not available as a boot service >> call option [3], [4], and install_configuration_table does not >> seem to work as a valid substitute. >> > > To do what, exactly? > To change the memory type from System RAM to either reserved or something more appropriate, i.e. any type that is not touched by kexec or any other userspace. Basically the example code I attached at the end of the cover letter in https://lore.kernel.org/all/20250108215957.3437660-1-usamaarif642@gmail.com/ It could be EFI_ACPI_RECLAIM_MEMORY or EFI_RESERVED_TYPE, both of which aren't touched by kexec. >> As a last option for a fix, this patch marks that region as reserved in >> e820_table_firmware if it is currently E820_TYPE_RAM so that kexec doesn't >> use it for kernel segments. >> >> [1] https://lore.kernel.org/all/20241031175822.2952471-2-ardb+git@google.com/ >> [2] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Core/Dxe/Misc/MemoryAttributesTable.c#L100 >> [3] https://github.com/tianocore/edk2/blob/42a141800c0c26a09d2344e84a89ce4097a263ae/MdeModulePkg/Core/Dxe/DxeMain/DxeMain.c#L41 >> [4] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/firmware/efi/libstub/efistub.h#L327 >> >> Reported-by: Breno Leitao <leitao@debian.org> >> Signed-off-by: Usama Arif <usamaarif642@gmail.com> >> --- >> arch/x86/include/asm/e820/api.h | 2 ++ >> arch/x86/kernel/e820.c | 6 ++++++ >> arch/x86/platform/efi/efi.c | 9 +++++++++ >> drivers/firmware/efi/memattr.c | 1 + >> include/linux/efi.h | 7 +++++++ >> 5 files changed, 25 insertions(+) >> >> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h >> index 2e74a7f0e935..4e9aa24f03bd 100644 >> --- a/arch/x86/include/asm/e820/api.h >> +++ b/arch/x86/include/asm/e820/api.h >> @@ -16,6 +16,8 @@ extern bool e820__mapped_all(u64 start, u64 end, enum e820_type type); >> >> extern void e820__range_add (u64 start, u64 size, enum e820_type type); >> extern u64 e820__range_update(u64 start, u64 size, enum e820_type old_type, enum e820_type new_type); >> +extern u64 e820__range_update_firmware(u64 start, u64 size, enum e820_type old_type, >> + enum e820_type new_type); >> extern u64 e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type); >> extern u64 e820__range_update_table(struct e820_table *t, u64 start, u64 size, enum e820_type old_type, enum e820_type new_type); >> >> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c >> index 82b96ed9890a..01d7d3c0d299 100644 >> --- a/arch/x86/kernel/e820.c >> +++ b/arch/x86/kernel/e820.c >> @@ -538,6 +538,12 @@ u64 __init e820__range_update_table(struct e820_table *t, u64 start, u64 size, >> return __e820__range_update(t, start, size, old_type, new_type); >> } >> >> +u64 __init e820__range_update_firmware(u64 start, u64 size, enum e820_type old_type, >> + enum e820_type new_type) >> +{ >> + return __e820__range_update(e820_table_firmware, start, size, old_type, new_type); >> +} >> + >> /* Remove a range of memory from the E820 table: */ >> u64 __init e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type) >> { >> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c >> index a7ff189421c3..13684c5d7c05 100644 >> --- a/arch/x86/platform/efi/efi.c >> +++ b/arch/x86/platform/efi/efi.c >> @@ -168,6 +168,15 @@ static void __init do_add_efi_memmap(void) >> e820__update_table(e820_table); >> } >> >> +/* Reserve firmware area if it was marked as RAM */ >> +void arch_update_firmware_area(u64 addr, u64 size) >> +{ >> + if (e820__get_entry_type(addr, addr + size) == E820_TYPE_RAM) { >> + e820__range_update_firmware(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); >> + e820__update_table(e820_table_firmware); >> + } >> +} >> + >> /* >> * Given add_efi_memmap defaults to 0 and there is no alternative >> * e820 mechanism for soft-reserved memory, import the full EFI memory >> diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c >> index d3bc161361fb..d131781e2d7b 100644 >> --- a/drivers/firmware/efi/memattr.c >> +++ b/drivers/firmware/efi/memattr.c >> @@ -53,6 +53,7 @@ int __init efi_memattr_init(void) >> size = tbl->num_entries * tbl->desc_size; >> tbl_size = sizeof(*tbl) + size; >> memblock_reserve(efi_mem_attr_table, tbl_size); >> + arch_update_firmware_area(efi_mem_attr_table, tbl_size); >> set_bit(EFI_MEM_ATTR, &efi.flags); >> >> unmap: >> diff --git a/include/linux/efi.h b/include/linux/efi.h >> index e5815867aba9..8eb9698bd6a4 100644 >> --- a/include/linux/efi.h >> +++ b/include/linux/efi.h >> @@ -1358,4 +1358,11 @@ extern struct blocking_notifier_head efivar_ops_nh; >> void efivars_generic_ops_register(void); >> void efivars_generic_ops_unregister(void); >> >> +#ifdef CONFIG_X86_64 >> +void __init arch_update_firmware_area(u64 addr, u64 size); >> +#else >> +static inline void __init arch_update_firmware_area(u64 addr, u64 size) >> +{ >> +} >> +#endif >> #endif /* _LINUX_EFI_H */ >> -- >> 2.43.5 >>
On Thu, Jan 09, 2025 at 04:32:10PM +0000, Usama Arif wrote: > > > On 09/01/2025 16:15, Ard Biesheuvel wrote: > > On Wed, 8 Jan 2025 at 23:00, Usama Arif <usamaarif642@gmail.com> wrote: > >> > >> When this area is not reserved, it comes up as usable in > >> /sys/firmware/memmap. This means that kexec, which uses that memmap > >> to find usable memory regions, can select the region where > >> efi_mem_attr_table is and overwrite it and relocate_kernel. > >> > >> Since the patch in [1] was merged, all boots after kexec > >> are producing the warning that it introduced. > >> > >> Having a fix in firmware can be difficult to get through. > > > > I don't follow. I don't think there is anything wrong with the > > firmware here. Could you elaborate? > > > > So the problem is, kexec sees this memory as System RAM, and decides > it can be used to place an image here. > > I guess the question is (and I actually don't know the answer here), > whose responsibility is it to mark this region as reserved so that > its not touched by anyone else. I would have thought it should be > firmware? > > Maybe its not the firmwares' job to mark it as reserved, but just pass > it to kernel and the kernel is supposed to make sure it gets reserved > in a proper way, even across kexecs. Reservation is a kernel concept, the firmware/EFI etc give the kernel guidance - but the actual maps here are largely informational and the kernel can do whatever it wants (within reason). So what you're really saying is either a) the hardware is mis-reporting the memory map configurations (e.g. some EFI_* bit is missing), or b) the kernel isn't doing the right thing. I believe Ard is pointing out that the additional map abstraction is just trying to retain a "current" versus "original" configuration - and this is more of a headache than it is worth as it appears to be causing confusion as to which one should be respected across kexec. So the separate maps should just be one, and we should just smash the "original" configuration rather than keeping a copy. Please correct me if i'm misunderstanding anything. > > I think in the end whoevers' responsibility it is, the easiest path forward > seems to be in kernel? (and not firmware or libstub) > > > > >> The next ideal place would be in libstub. However, it looks like > >> InstallMemoryAttributesTable [2] is not available as a boot service > >> call option [3], [4], and install_configuration_table does not > >> seem to work as a valid substitute. > >> > > > > To do what, exactly? > > > > To change the memory type from System RAM to either reserved or > something more appropriate, i.e. any type that is not touched by > kexec or any other userspace. > > Basically the example code I attached at the end of the cover letter in > https://lore.kernel.org/all/20250108215957.3437660-1-usamaarif642@gmail.com/ > It could be EFI_ACPI_RECLAIM_MEMORY or EFI_RESERVED_TYPE, both of which aren't > touched by kexec. > > >> As a last option for a fix, this patch marks that region as reserved in > >> e820_table_firmware if it is currently E820_TYPE_RAM so that kexec doesn't > >> use it for kernel segments. > >> > >> [1] https://lore.kernel.org/all/20241031175822.2952471-2-ardb+git@google.com/ > >> [2] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Core/Dxe/Misc/MemoryAttributesTable.c#L100 > >> [3] https://github.com/tianocore/edk2/blob/42a141800c0c26a09d2344e84a89ce4097a263ae/MdeModulePkg/Core/Dxe/DxeMain/DxeMain.c#L41 > >> [4] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/firmware/efi/libstub/efistub.h#L327 > >> > >> Reported-by: Breno Leitao <leitao@debian.org> > >> Signed-off-by: Usama Arif <usamaarif642@gmail.com> > >> --- > >> arch/x86/include/asm/e820/api.h | 2 ++ > >> arch/x86/kernel/e820.c | 6 ++++++ > >> arch/x86/platform/efi/efi.c | 9 +++++++++ > >> drivers/firmware/efi/memattr.c | 1 + > >> include/linux/efi.h | 7 +++++++ > >> 5 files changed, 25 insertions(+) > >> > >> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h > >> index 2e74a7f0e935..4e9aa24f03bd 100644 > >> --- a/arch/x86/include/asm/e820/api.h > >> +++ b/arch/x86/include/asm/e820/api.h > >> @@ -16,6 +16,8 @@ extern bool e820__mapped_all(u64 start, u64 end, enum e820_type type); > >> > >> extern void e820__range_add (u64 start, u64 size, enum e820_type type); > >> extern u64 e820__range_update(u64 start, u64 size, enum e820_type old_type, enum e820_type new_type); > >> +extern u64 e820__range_update_firmware(u64 start, u64 size, enum e820_type old_type, > >> + enum e820_type new_type); > >> extern u64 e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type); > >> extern u64 e820__range_update_table(struct e820_table *t, u64 start, u64 size, enum e820_type old_type, enum e820_type new_type); > >> > >> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > >> index 82b96ed9890a..01d7d3c0d299 100644 > >> --- a/arch/x86/kernel/e820.c > >> +++ b/arch/x86/kernel/e820.c > >> @@ -538,6 +538,12 @@ u64 __init e820__range_update_table(struct e820_table *t, u64 start, u64 size, > >> return __e820__range_update(t, start, size, old_type, new_type); > >> } > >> > >> +u64 __init e820__range_update_firmware(u64 start, u64 size, enum e820_type old_type, > >> + enum e820_type new_type) > >> +{ > >> + return __e820__range_update(e820_table_firmware, start, size, old_type, new_type); > >> +} > >> + > >> /* Remove a range of memory from the E820 table: */ > >> u64 __init e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type) > >> { > >> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c > >> index a7ff189421c3..13684c5d7c05 100644 > >> --- a/arch/x86/platform/efi/efi.c > >> +++ b/arch/x86/platform/efi/efi.c > >> @@ -168,6 +168,15 @@ static void __init do_add_efi_memmap(void) > >> e820__update_table(e820_table); > >> } > >> > >> +/* Reserve firmware area if it was marked as RAM */ > >> +void arch_update_firmware_area(u64 addr, u64 size) > >> +{ > >> + if (e820__get_entry_type(addr, addr + size) == E820_TYPE_RAM) { > >> + e820__range_update_firmware(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); > >> + e820__update_table(e820_table_firmware); > >> + } > >> +} > >> + > >> /* > >> * Given add_efi_memmap defaults to 0 and there is no alternative > >> * e820 mechanism for soft-reserved memory, import the full EFI memory > >> diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c > >> index d3bc161361fb..d131781e2d7b 100644 > >> --- a/drivers/firmware/efi/memattr.c > >> +++ b/drivers/firmware/efi/memattr.c > >> @@ -53,6 +53,7 @@ int __init efi_memattr_init(void) > >> size = tbl->num_entries * tbl->desc_size; > >> tbl_size = sizeof(*tbl) + size; > >> memblock_reserve(efi_mem_attr_table, tbl_size); > >> + arch_update_firmware_area(efi_mem_attr_table, tbl_size); > >> set_bit(EFI_MEM_ATTR, &efi.flags); > >> > >> unmap: > >> diff --git a/include/linux/efi.h b/include/linux/efi.h > >> index e5815867aba9..8eb9698bd6a4 100644 > >> --- a/include/linux/efi.h > >> +++ b/include/linux/efi.h > >> @@ -1358,4 +1358,11 @@ extern struct blocking_notifier_head efivar_ops_nh; > >> void efivars_generic_ops_register(void); > >> void efivars_generic_ops_unregister(void); > >> > >> +#ifdef CONFIG_X86_64 > >> +void __init arch_update_firmware_area(u64 addr, u64 size); > >> +#else > >> +static inline void __init arch_update_firmware_area(u64 addr, u64 size) > >> +{ > >> +} > >> +#endif > >> #endif /* _LINUX_EFI_H */ > >> -- > >> 2.43.5 > >> >
Hi Usama, On Thu, 9 Jan 2025 at 06:00, Usama Arif <usamaarif642@gmail.com> wrote: > > When this area is not reserved, it comes up as usable in > /sys/firmware/memmap. This means that kexec, which uses that memmap > to find usable memory regions, can select the region where > efi_mem_attr_table is and overwrite it and relocate_kernel. Is the attr table BOOT SERVICE DATA? If so, does efi_mem_reserve() work for you? Just refer to esrt.c. Thanks Dave
On Thu, 9 Jan 2025 at 17:32, Usama Arif <usamaarif642@gmail.com> wrote: > > > > On 09/01/2025 16:15, Ard Biesheuvel wrote: > > On Wed, 8 Jan 2025 at 23:00, Usama Arif <usamaarif642@gmail.com> wrote: > >> > >> When this area is not reserved, it comes up as usable in > >> /sys/firmware/memmap. This means that kexec, which uses that memmap > >> to find usable memory regions, can select the region where > >> efi_mem_attr_table is and overwrite it and relocate_kernel. > >> > >> Since the patch in [1] was merged, all boots after kexec > >> are producing the warning that it introduced. > >> > >> Having a fix in firmware can be difficult to get through. > > > > I don't follow. I don't think there is anything wrong with the > > firmware here. Could you elaborate? > > > > So the problem is, kexec sees this memory as System RAM, and decides > it can be used to place an image here. > > I guess the question is (and I actually don't know the answer here), > whose responsibility is it to mark this region as reserved so that > its not touched by anyone else. I would have thought it should be > firmware? > No, it is the OS. The firmware only reserves regions that are needed for its own correct operation at runtime. For informational tables such as this one, it is up to the OS whether it wants to reserve it and keep it in place, consume it and release the memory, or ignore it altogether. > Maybe its not the firmwares' job to mark it as reserved, but just pass > it to kernel and the kernel is supposed to make sure it gets reserved > in a proper way, even across kexecs. > Indeed. > I think in the end whoevers' responsibility it is, the easiest path forward > seems to be in kernel? (and not firmware or libstub) > Agreed. But as I pointed out in the other thread, the memory attributes table only augments the memory map with permission information, and can be disregarded, and given how badly we mangle the memory map on x86, maybe this is the right choice here. > > > >> The next ideal place would be in libstub. However, it looks like > >> InstallMemoryAttributesTable [2] is not available as a boot service > >> call option [3], [4], and install_configuration_table does not > >> seem to work as a valid substitute. > >> > > > > To do what, exactly? > > > > To change the memory type from System RAM to either reserved or > something more appropriate, i.e. any type that is not touched by > kexec or any other userspace. > > Basically the example code I attached at the end of the cover letter in > https://lore.kernel.org/all/20250108215957.3437660-1-usamaarif642@gmail.com/ > It could be EFI_ACPI_RECLAIM_MEMORY or EFI_RESERVED_TYPE, both of which aren't > touched by kexec. > This is a kexec problem (on x86 only) so let's fix it there.
On 10/01/2025 02:50, Dave Young wrote: > Hi Usama, > > On Thu, 9 Jan 2025 at 06:00, Usama Arif <usamaarif642@gmail.com> wrote: >> >> When this area is not reserved, it comes up as usable in >> /sys/firmware/memmap. This means that kexec, which uses that memmap >> to find usable memory regions, can select the region where >> efi_mem_attr_table is and overwrite it and relocate_kernel. > > Is the attr table BOOT SERVICE DATA? If so, does efi_mem_reserve() > work for you? > Just refer to esrt.c. > Hi Dave, Its a bit difficult to reproduce the problem and therefore test the fix, but we are seeing it a lot in production. Ard proposed the same thing in https://lore.kernel.org/all/6b4780a5-ada0-405e-9f0a-4d2186177f29@gmail.com/ but as I mentioned there, I dont think that efi_mem_reserve would help, as efi_mem_reserve changes e820_table, while kexec looks at /sys/firmware/memmap which uses e820_table_firmware. Thanks, Usama > Thanks > Dave >
On Fri, 10 Jan 2025 at 19:12, Usama Arif <usamaarif642@gmail.com> wrote: > > > > On 10/01/2025 02:50, Dave Young wrote: > > Hi Usama, > > > > On Thu, 9 Jan 2025 at 06:00, Usama Arif <usamaarif642@gmail.com> wrote: > >> > >> When this area is not reserved, it comes up as usable in > >> /sys/firmware/memmap. This means that kexec, which uses that memmap > >> to find usable memory regions, can select the region where > >> efi_mem_attr_table is and overwrite it and relocate_kernel. > > > > Is the attr table BOOT SERVICE DATA? If so, does efi_mem_reserve() > > work for you? > > Just refer to esrt.c. > > > > Hi Dave, > > Its a bit difficult to reproduce the problem and therefore test the fix, but > we are seeing it a lot in production. Ard proposed the same thing in > https://lore.kernel.org/all/6b4780a5-ada0-405e-9f0a-4d2186177f29@gmail.com/ > but as I mentioned there, I dont think that efi_mem_reserve would help, > as efi_mem_reserve changes e820_table, while kexec looks at > /sys/firmware/memmap which uses e820_table_firmware. I sent a question to pm people, if the sysfs memmap comes from e820_table then it will be fine. Let's see: https://lore.kernel.org/all/CALu+AoS-nk4u=9UYP7BLS=diOxjJRf+vfv7KHXG=uXozoYazsw@mail.gmail.com/ > > Thanks, > Usama > > > Thanks > > Dave > > >
On Fri, 10 Jan 2025 at 19:18, Dave Young <dyoung@redhat.com> wrote: > > On Fri, 10 Jan 2025 at 19:12, Usama Arif <usamaarif642@gmail.com> wrote: > > > > > > > > On 10/01/2025 02:50, Dave Young wrote: > > > Hi Usama, > > > > > > On Thu, 9 Jan 2025 at 06:00, Usama Arif <usamaarif642@gmail.com> wrote: > > >> > > >> When this area is not reserved, it comes up as usable in > > >> /sys/firmware/memmap. This means that kexec, which uses that memmap > > >> to find usable memory regions, can select the region where > > >> efi_mem_attr_table is and overwrite it and relocate_kernel. > > > > > > Is the attr table BOOT SERVICE DATA? If so, does efi_mem_reserve() > > > work for you? > > > Just refer to esrt.c. > > > > > > > Hi Dave, > > > > Its a bit difficult to reproduce the problem and therefore test the fix, but > > we are seeing it a lot in production. Ard proposed the same thing in > > https://lore.kernel.org/all/6b4780a5-ada0-405e-9f0a-4d2186177f29@gmail.com/ > > but as I mentioned there, I dont think that efi_mem_reserve would help, > > as efi_mem_reserve changes e820_table, while kexec looks at > > /sys/firmware/memmap which uses e820_table_firmware. > > I sent a question to pm people, if the sysfs memmap comes from > e820_table then it will be fine. Let's see: s/e820_table/e820_table_kexec > https://lore.kernel.org/all/CALu+AoS-nk4u=9UYP7BLS=diOxjJRf+vfv7KHXG=uXozoYazsw@mail.gmail.com/ > > > > > Thanks, > > Usama > > > > > Thanks > > > Dave > > > > >
Hello Ard, On Fri, Jan 10, 2025 at 08:32:08AM +0100, Ard Biesheuvel wrote: > On Thu, 9 Jan 2025 at 17:32, Usama Arif <usamaarif642@gmail.com> wrote: > > I think in the end whoevers' responsibility it is, the easiest path forward > > seems to be in kernel? (and not firmware or libstub) > > > > Agreed. But as I pointed out in the other thread, the memory > attributes table only augments the memory map with permission > information, and can be disregarded, and given how badly we mangle the > memory map on x86, maybe this is the right choice here. If this augmented memory is not preserved accross kexec, then the next kexec'ed kernel will be able to find the original table? I understand that the memattr region(s) need to be always (in each kexec instances) `memblocked_reserved` to protect it from being used as a System RAM, right? Thus, if it is not passed throught e820, kexec'ed kernel needs to fetch it again from original EFI table at kexec/boot time. This brings me another question. If the kexec'ed kernel sees the original memory, why can't it augment/update the RX permissions *again*, instead of passing the previous augmented version from previous kernel in this crazy dance. > This is a kexec problem (on x86 only) so let's fix it there. Would you mind explaining what kexec needs to be done differently? Should it preserve the augmented memattr table independently if it is mapped in e820? Thank you! --breno
On 10/01/2025 11:20, Dave Young wrote: > On Fri, 10 Jan 2025 at 19:18, Dave Young <dyoung@redhat.com> wrote: >> >> On Fri, 10 Jan 2025 at 19:12, Usama Arif <usamaarif642@gmail.com> wrote: >>> >>> >>> >>> On 10/01/2025 02:50, Dave Young wrote: >>>> Hi Usama, >>>> >>>> On Thu, 9 Jan 2025 at 06:00, Usama Arif <usamaarif642@gmail.com> wrote: >>>>> >>>>> When this area is not reserved, it comes up as usable in >>>>> /sys/firmware/memmap. This means that kexec, which uses that memmap >>>>> to find usable memory regions, can select the region where >>>>> efi_mem_attr_table is and overwrite it and relocate_kernel. >>>> >>>> Is the attr table BOOT SERVICE DATA? If so, does efi_mem_reserve() >>>> work for you? >>>> Just refer to esrt.c. >>>> >>> >>> Hi Dave, >>> >>> Its a bit difficult to reproduce the problem and therefore test the fix, but >>> we are seeing it a lot in production. Ard proposed the same thing in >>> https://lore.kernel.org/all/6b4780a5-ada0-405e-9f0a-4d2186177f29@gmail.com/ >>> but as I mentioned there, I dont think that efi_mem_reserve would help, >>> as efi_mem_reserve changes e820_table, while kexec looks at >>> /sys/firmware/memmap which uses e820_table_firmware. >> >> I sent a question to pm people, if the sysfs memmap comes from >> e820_table then it will be fine. Let's see: > s/e820_table/e820_table_kexec > Do you mean change /sys/firmware/memmap to point to e820_table_kexec instead of e820_table_firmware [1]? I thought of doing this when the first bug was encountered last year, but didn't do it as I thought it would be frowned upon to change what sysfs file exposes to userspace. [1] https://elixir.bootlin.com/linux/v6.12.6/source/arch/x86/kernel/e820.c#L31 >> https://lore.kernel.org/all/CALu+AoS-nk4u=9UYP7BLS=diOxjJRf+vfv7KHXG=uXozoYazsw@mail.gmail.com/ >> >>> >>> Thanks, >>> Usama >>> >>>> Thanks >>>> Dave >>>> >>> >
On 10/01/2025 07:32, Ard Biesheuvel wrote: > On Thu, 9 Jan 2025 at 17:32, Usama Arif <usamaarif642@gmail.com> wrote: >> >> >> >> On 09/01/2025 16:15, Ard Biesheuvel wrote: >> I think in the end whoevers' responsibility it is, the easiest path forward >> seems to be in kernel? (and not firmware or libstub) >> > > Agreed. But as I pointed out in the other thread, the memory > attributes table only augments the memory map with permission > information, and can be disregarded, and given how badly we mangle the > memory map on x86, maybe this is the right choice here. > >>> >>>> The next ideal place would be in libstub. However, it looks like >>>> InstallMemoryAttributesTable [2] is not available as a boot service >>>> call option [3], [4], and install_configuration_table does not >>>> seem to work as a valid substitute. >>>> >>> >>> To do what, exactly? >>> >> >> To change the memory type from System RAM to either reserved or >> something more appropriate, i.e. any type that is not touched by >> kexec or any other userspace. >> >> Basically the example code I attached at the end of the cover letter in >> https://lore.kernel.org/all/20250108215957.3437660-1-usamaarif642@gmail.com/ >> It could be EFI_ACPI_RECLAIM_MEMORY or EFI_RESERVED_TYPE, both of which aren't >> touched by kexec. >> > > This is a kexec problem (on x86 only) so let's fix it there. I don't believe we can accurately tell if we are booting from a cold boot or kexec. There is bootloader_type available for x86, but not sure if we should rely on that. I think a way forward would be to move it behind a Kconfig option, something like below, which defaults to n for x86. Anyone who needs it can enable it. What do you think? diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index aa95f77d7a30..31deb0a5371e 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -83,7 +83,9 @@ static const unsigned long * const efi_tables[] = { &efi_config_table, &efi.esrt, &prop_phys, +#ifdef CONFIG_EFI_MEMATTR &efi_mem_attr_table, +#endif #ifdef CONFIG_EFI_RCI2_TABLE &rci2_table_phys, #endif diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index 72f2537d90ca..b8ecb318768c 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -287,6 +287,13 @@ config EFI_EMBEDDED_FIRMWARE bool select CRYPTO_LIB_SHA256 +config EFI_MEMATTR + bool "EFI Memory attributes table" + default n if X86_64 + help + EFI Memory Attributes table describes memory protections that may + be applied to the EFI Runtime code and data regions by the kernel. + endmenu config UEFI_CPER diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index a2d0009560d0..c593ec0d9940 100644 --- a/drivers/firmware/efi/Makefile +++ b/drivers/firmware/efi/Makefile @@ -11,7 +11,9 @@ KASAN_SANITIZE_runtime-wrappers.o := n obj-$(CONFIG_ACPI_BGRT) += efi-bgrt.o -obj-$(CONFIG_EFI) += efi.o vars.o reboot.o memattr.o tpm.o +obj-$(CONFIG_EFI) += efi.o vars.o reboot.o tpm.o +obj-$(CONFIG_EFI_MEMATTR) += memattr.o + obj-$(CONFIG_EFI) += memmap.o ifneq ($(CONFIG_EFI_CAPSULE_LOADER),) obj-$(CONFIG_EFI) += capsule.o diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index fdf07dd6f459..f359179083d5 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -596,7 +596,9 @@ static const efi_config_table_type_t common_tables[] __initconst = { {SMBIOS_TABLE_GUID, &efi.smbios, "SMBIOS" }, {SMBIOS3_TABLE_GUID, &efi.smbios3, "SMBIOS 3.0" }, {EFI_SYSTEM_RESOURCE_TABLE_GUID, &efi.esrt, "ESRT" }, +#ifdef CONFIG_EFI_MEMATTR {EFI_MEMORY_ATTRIBUTES_TABLE_GUID, &efi_mem_attr_table, "MEMATTR" }, +#endif {LINUX_EFI_RANDOM_SEED_TABLE_GUID, &efi_rng_seed, "RNG" }, {LINUX_EFI_TPM_EVENT_LOG_GUID, &efi.tpm_log, "TPMEventLog" }, {EFI_TCG2_FINAL_EVENTS_TABLE_GUID, &efi.tpm_final_log, "TPMFinalLog" }, diff --git a/include/linux/efi.h b/include/linux/efi.h index 9c239cdff771..4cf5ebe014e2 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -783,9 +783,21 @@ extern unsigned long efi_mem_attr_table; */ typedef int (*efi_memattr_perm_setter)(struct mm_struct *, efi_memory_desc_t *, bool); +#ifdef CONFIG_EFI_MEMATTR extern int efi_memattr_init(void); extern int efi_memattr_apply_permissions(struct mm_struct *mm, efi_memattr_perm_setter fn); +#else +static inline int efi_memattr_init(void) +{ + return 0; +} +static inline int efi_memattr_apply_permissions(struct mm_struct *mm, + efi_memattr_perm_setter fn) +{ + return 0; +} +#endif /* * efi_memdesc_ptr - get the n-th EFI memmap descriptor
On 10/01/2025 14:31, Usama Arif wrote: > > > On 10/01/2025 07:32, Ard Biesheuvel wrote: >> On Thu, 9 Jan 2025 at 17:32, Usama Arif <usamaarif642@gmail.com> wrote: >>> >>> >>> >>> On 09/01/2025 16:15, Ard Biesheuvel wrote: > >>> I think in the end whoevers' responsibility it is, the easiest path forward >>> seems to be in kernel? (and not firmware or libstub) >>> >> >> Agreed. But as I pointed out in the other thread, the memory >> attributes table only augments the memory map with permission >> information, and can be disregarded, and given how badly we mangle the >> memory map on x86, maybe this is the right choice here. >> >>>> >>>>> The next ideal place would be in libstub. However, it looks like >>>>> InstallMemoryAttributesTable [2] is not available as a boot service >>>>> call option [3], [4], and install_configuration_table does not >>>>> seem to work as a valid substitute. >>>>> >>>> >>>> To do what, exactly? >>>> >>> >>> To change the memory type from System RAM to either reserved or >>> something more appropriate, i.e. any type that is not touched by >>> kexec or any other userspace. >>> >>> Basically the example code I attached at the end of the cover letter in >>> https://lore.kernel.org/all/20250108215957.3437660-1-usamaarif642@gmail.com/ >>> It could be EFI_ACPI_RECLAIM_MEMORY or EFI_RESERVED_TYPE, both of which aren't >>> touched by kexec. >>> >> >> This is a kexec problem (on x86 only) so let's fix it there. > > > I don't believe we can accurately tell if we are booting from a cold boot or kexec. > There is bootloader_type available for x86, but not sure if we should rely on > that. I think a way forward would be to move it behind a Kconfig option, something like > below, which defaults to n for x86. Anyone who needs it can enable it. What do you think? > Or we can do something like below? diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c index d131781e2d7b..4add694b18d0 100644 --- a/drivers/firmware/efi/memattr.c +++ b/drivers/firmware/efi/memattr.c @@ -24,6 +24,15 @@ int __init efi_memattr_init(void) efi_memory_attributes_table_t *tbl; unsigned long size; +#ifdef CONFIG_X86_64 + /* + * On x86_64, do not initialize memory attributes table + * if booting from kexec + */ + if (bootloader_type >> 4 == 0xd) + return 0; +#endif + if (efi_mem_attr_table == EFI_INVALID_TABLE_ADDR) return 0;
On Fri, 10 Jan 2025 at 12:36, Breno Leitao <leitao@debian.org> wrote: > > Hello Ard, > > On Fri, Jan 10, 2025 at 08:32:08AM +0100, Ard Biesheuvel wrote: > > On Thu, 9 Jan 2025 at 17:32, Usama Arif <usamaarif642@gmail.com> wrote: > > > > I think in the end whoevers' responsibility it is, the easiest path forward > > > seems to be in kernel? (and not firmware or libstub) > > > > > > > Agreed. But as I pointed out in the other thread, the memory > > attributes table only augments the memory map with permission > > information, and can be disregarded, and given how badly we mangle the > > memory map on x86, maybe this is the right choice here. > > If this augmented memory is not preserved accross kexec, then the next > kexec'ed kernel will be able to find the original table? > > I understand that the memattr region(s) need to be always (in each kexec > instances) `memblocked_reserved` to protect it from being used as a > System RAM, right? > > Thus, if it is not passed throught e820, kexec'ed kernel needs to fetch > it again from original EFI table at kexec/boot time. > Not sure what 'fetching' means here. > This brings me another question. > > If the kexec'ed kernel sees the original memory, why can't it > augment/update the RX permissions *again*, instead of passing the > previous augmented version from previous kernel in this crazy dance. > I don't understand what original memory means. I think we're talking past each other tbh. > > This is a kexec problem (on x86 only) so let's fix it there. > > Would you mind explaining what kexec needs to be done differently? > Should it preserve the augmented memattr table independently if it is > mapped in e820? > I don't know what 'mapped in e820' means. Let's forget about what the memory attributes table actually contains, and just assume we can live without it, ok? So when booting x86 via kexec (which is already detected in arch/x86/platform/efi/efi.c), the kernel should pretend that the table does not exist.
diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h index 2e74a7f0e935..4e9aa24f03bd 100644 --- a/arch/x86/include/asm/e820/api.h +++ b/arch/x86/include/asm/e820/api.h @@ -16,6 +16,8 @@ extern bool e820__mapped_all(u64 start, u64 end, enum e820_type type); extern void e820__range_add (u64 start, u64 size, enum e820_type type); extern u64 e820__range_update(u64 start, u64 size, enum e820_type old_type, enum e820_type new_type); +extern u64 e820__range_update_firmware(u64 start, u64 size, enum e820_type old_type, + enum e820_type new_type); extern u64 e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type); extern u64 e820__range_update_table(struct e820_table *t, u64 start, u64 size, enum e820_type old_type, enum e820_type new_type); diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 82b96ed9890a..01d7d3c0d299 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -538,6 +538,12 @@ u64 __init e820__range_update_table(struct e820_table *t, u64 start, u64 size, return __e820__range_update(t, start, size, old_type, new_type); } +u64 __init e820__range_update_firmware(u64 start, u64 size, enum e820_type old_type, + enum e820_type new_type) +{ + return __e820__range_update(e820_table_firmware, start, size, old_type, new_type); +} + /* Remove a range of memory from the E820 table: */ u64 __init e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type) { diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index a7ff189421c3..13684c5d7c05 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -168,6 +168,15 @@ static void __init do_add_efi_memmap(void) e820__update_table(e820_table); } +/* Reserve firmware area if it was marked as RAM */ +void arch_update_firmware_area(u64 addr, u64 size) +{ + if (e820__get_entry_type(addr, addr + size) == E820_TYPE_RAM) { + e820__range_update_firmware(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); + e820__update_table(e820_table_firmware); + } +} + /* * Given add_efi_memmap defaults to 0 and there is no alternative * e820 mechanism for soft-reserved memory, import the full EFI memory diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c index d3bc161361fb..d131781e2d7b 100644 --- a/drivers/firmware/efi/memattr.c +++ b/drivers/firmware/efi/memattr.c @@ -53,6 +53,7 @@ int __init efi_memattr_init(void) size = tbl->num_entries * tbl->desc_size; tbl_size = sizeof(*tbl) + size; memblock_reserve(efi_mem_attr_table, tbl_size); + arch_update_firmware_area(efi_mem_attr_table, tbl_size); set_bit(EFI_MEM_ATTR, &efi.flags); unmap: diff --git a/include/linux/efi.h b/include/linux/efi.h index e5815867aba9..8eb9698bd6a4 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1358,4 +1358,11 @@ extern struct blocking_notifier_head efivar_ops_nh; void efivars_generic_ops_register(void); void efivars_generic_ops_unregister(void); +#ifdef CONFIG_X86_64 +void __init arch_update_firmware_area(u64 addr, u64 size); +#else +static inline void __init arch_update_firmware_area(u64 addr, u64 size) +{ +} +#endif #endif /* _LINUX_EFI_H */
When this area is not reserved, it comes up as usable in /sys/firmware/memmap. This means that kexec, which uses that memmap to find usable memory regions, can select the region where efi_mem_attr_table is and overwrite it and relocate_kernel. Since the patch in [1] was merged, all boots after kexec are producing the warning that it introduced. Having a fix in firmware can be difficult to get through. The next ideal place would be in libstub. However, it looks like InstallMemoryAttributesTable [2] is not available as a boot service call option [3], [4], and install_configuration_table does not seem to work as a valid substitute. As a last option for a fix, this patch marks that region as reserved in e820_table_firmware if it is currently E820_TYPE_RAM so that kexec doesn't use it for kernel segments. [1] https://lore.kernel.org/all/20241031175822.2952471-2-ardb+git@google.com/ [2] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Core/Dxe/Misc/MemoryAttributesTable.c#L100 [3] https://github.com/tianocore/edk2/blob/42a141800c0c26a09d2344e84a89ce4097a263ae/MdeModulePkg/Core/Dxe/DxeMain/DxeMain.c#L41 [4] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/firmware/efi/libstub/efistub.h#L327 Reported-by: Breno Leitao <leitao@debian.org> Signed-off-by: Usama Arif <usamaarif642@gmail.com> --- arch/x86/include/asm/e820/api.h | 2 ++ arch/x86/kernel/e820.c | 6 ++++++ arch/x86/platform/efi/efi.c | 9 +++++++++ drivers/firmware/efi/memattr.c | 1 + include/linux/efi.h | 7 +++++++ 5 files changed, 25 insertions(+)