Message ID | 20240620073205.1543145-1-ardb+git@google.com |
---|---|
State | New |
Headers | show |
Series | [RFC] x86/efi: Drop support for fake EFI memory maps | expand |
On Thu, Jun 20, 2024 at 09:32:05AM +0200, Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > Between kexec and confidential VM support, handling the EFI memory maps > correctly on x86 is already proving to be rather difficult (as opposed > to other EFI architectures which manage to never modify the EFI memory > map to begin with) > > EFI fake memory map support is essentially a development hack (for > testing new support for the 'special purpose' and 'more reliable' EFI > memory attributes) that leaked into production code. The regions marked > in this manner are not actually recognized as such by the firmware > itself or the EFI stub (and never have), and marking memory as 'more > reliable' seems rather futile if the underlying memory is just ordinary > RAM. > > Marking memory as 'special purpose' in this way is also dubious, but may > be in use in production code nonetheless. However, the same should be > achievable by using the memmap= command line option with the ! operator. > > EFI fake memmap support is not enabled by any of the major distros > (Debian, Fedora, SUSE, Ubuntu) and does not exist on other > architectures, so let's drop support for it. > > Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > Documentation/admin-guide/kernel-parameters.txt | 21 --- > arch/x86/Kconfig | 20 -- > arch/x86/boot/compressed/kaslr.c | 43 +---- > arch/x86/include/asm/efi.h | 15 -- > arch/x86/kernel/setup.c | 1 - > arch/x86/platform/efi/efi.c | 2 - > arch/x86/platform/efi/fake_mem.c | 197 -------------------- > arch/x86/platform/efi/memmap.c | 1 + > drivers/firmware/efi/libstub/x86-stub.c | 2 +- > 9 files changed, 11 insertions(+), 291 deletions(-) I obviously like this: Acked-by: Borislav Petkov (AMD) <bp@alien8.de> I don't see the author or anyone else objecting, I guess queue it? Or if you feel like you wanna give folks a full cycle, you could queue it for the next MW... Thx.
On Mon, 1 Jul 2024 at 14:47, Borislav Petkov <bp@alien8.de> wrote: > > On Thu, Jun 20, 2024 at 09:32:05AM +0200, Ard Biesheuvel wrote: > > From: Ard Biesheuvel <ardb@kernel.org> > > > > Between kexec and confidential VM support, handling the EFI memory maps > > correctly on x86 is already proving to be rather difficult (as opposed > > to other EFI architectures which manage to never modify the EFI memory > > map to begin with) > > > > EFI fake memory map support is essentially a development hack (for > > testing new support for the 'special purpose' and 'more reliable' EFI > > memory attributes) that leaked into production code. The regions marked > > in this manner are not actually recognized as such by the firmware > > itself or the EFI stub (and never have), and marking memory as 'more > > reliable' seems rather futile if the underlying memory is just ordinary > > RAM. > > > > Marking memory as 'special purpose' in this way is also dubious, but may > > be in use in production code nonetheless. However, the same should be > > achievable by using the memmap= command line option with the ! operator. > > > > EFI fake memmap support is not enabled by any of the major distros > > (Debian, Fedora, SUSE, Ubuntu) and does not exist on other > > architectures, so let's drop support for it. > > > > Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > Documentation/admin-guide/kernel-parameters.txt | 21 --- > > arch/x86/Kconfig | 20 -- > > arch/x86/boot/compressed/kaslr.c | 43 +---- > > arch/x86/include/asm/efi.h | 15 -- > > arch/x86/kernel/setup.c | 1 - > > arch/x86/platform/efi/efi.c | 2 - > > arch/x86/platform/efi/fake_mem.c | 197 -------------------- > > arch/x86/platform/efi/memmap.c | 1 + > > drivers/firmware/efi/libstub/x86-stub.c | 2 +- > > 9 files changed, 11 insertions(+), 291 deletions(-) > > I obviously like this: > > Acked-by: Borislav Petkov (AMD) <bp@alien8.de> > > I don't see the author or anyone else objecting, I guess queue it? > Thanks. > Or if you feel like you wanna give folks a full cycle, you could queue it for > the next MW... > It's been in -next for ~10 days so I might just send it for the next cycle. We can always revert it if something gets broken.
[ add Tony who may care about the more-reliable removal ] Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > Between kexec and confidential VM support, handling the EFI memory maps > correctly on x86 is already proving to be rather difficult (as opposed > to other EFI architectures which manage to never modify the EFI memory > map to begin with) > > EFI fake memory map support is essentially a development hack (for > testing new support for the 'special purpose' and 'more reliable' EFI > memory attributes) that leaked into production code. The regions marked > in this manner are not actually recognized as such by the firmware > itself or the EFI stub (and never have), and marking memory as 'more > reliable' seems rather futile if the underlying memory is just ordinary > RAM. > > Marking memory as 'special purpose' in this way is also dubious, but may > be in use in production code nonetheless. However, the same should be > achievable by using the memmap= command line option with the ! operator. Ugh, sorry I missed this earlier. I only now just saw this bubble up in my inbox from the reply from Boris, but I have concerns with this removal. The problem is that what is and is not "special purpose" is a platform *and* use case policy. BIOS can only make a rough guess at platform manufacturing time, and certainly does not have use case information. One of the main reasons to prefer efi_fake_mem over memmap= is all of the end user hand holding needed to correct broken usage of memmap= https://nvdimm.wiki.kernel.org/how_to_choose_the_correct_memmap_kernel_parameter_for_pmem_on_your_system ...if anything I have been wanting to rip out memmap= more than efi_fake_mem just to cut down on the PEBKAC reports from memmap= users from the PMEM days. Those reports only died down because Optane died, but CXL is here now to revive that "fun". However, that said, maybe the safety properties of efi_fake_mem=nn@ss:0x40000 can be brought over to memmap=. The main property that protects end users is that the attribute only applies to existing EFI Conventional Memory ranges. So unlike memmap= that blindly replaces the memory map whether it makes sense or not, efi_fake_mem= would fail gracefully if the attribute was applied to nonsense ranges. > EFI fake memmap support is not enabled by any of the major distros > (Debian, Fedora, SUSE, Ubuntu) and does not exist on other > architectures, so let's drop support for it. I think part of the problem here is that platform with differentiated memory, (PMEM, HBM, CXL, etc...) have remained somewhat boutique since the introduction of this facility. As they become more ubiquitous, as trends seem to indicate, I think the frustration with BIOS policy may become more widespread. So we can remove it and see who screams, but we may want to put a work-alike safe option in memmap= first just to keep our own sanity.
On Mon, Jul 01, 2024 at 01:17:01PM -0700, Dan Williams wrote:
> [ add Tony who may care about the more-reliable removal ]
I don't think I care about removal the fake options. I have had
systems that support real "more-reliable" memory for many years.
-Tony
Tony Luck wrote: > On Mon, Jul 01, 2024 at 01:17:01PM -0700, Dan Williams wrote: > > [ add Tony who may care about the more-reliable removal ] > > I don't think I care about removal the fake options. I have had > systems that support real "more-reliable" memory for many years. Ok, lets remove it and see what happens. I recall now that the major policy knob that users may care about is efi=nosoftreserve, and the ability to specify efi_fake_mem for testing purposes can move to memmap= on the assumption that kernel-devs are much less likely to specify broken settings, or at least can pick up the pieces. Ard, you can add: Acked-by: Dan Williams <dan.j.williams@intel.com>
On Mon, 1 Jul 2024 at 22:54, Dan Williams <dan.j.williams@intel.com> wrote: > > Tony Luck wrote: > > On Mon, Jul 01, 2024 at 01:17:01PM -0700, Dan Williams wrote: > > > [ add Tony who may care about the more-reliable removal ] > > > > I don't think I care about removal the fake options. I have had > > systems that support real "more-reliable" memory for many years. > > Ok, lets remove it and see what happens. I recall now that the major > policy knob that users may care about is efi=nosoftreserve, and the > ability to specify efi_fake_mem for testing purposes can move to memmap= > on the assumption that kernel-devs are much less likely to specify > broken settings, or at least can pick up the pieces. > > Ard, you can add: > > Acked-by: Dan Williams <dan.j.williams@intel.com> > Thanks all.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 11e57ba2985c..90c64525e7e1 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1450,27 +1450,6 @@ you are really sure that your UEFI does sane gc and fulfills the spec otherwise your board may brick. - efi_fake_mem= nn[KMG]@ss[KMG]:aa[,nn[KMG]@ss[KMG]:aa,..] [EFI,X86,EARLY] - Add arbitrary attribute to specific memory range by - updating original EFI memory map. - Region of memory which aa attribute is added to is - from ss to ss+nn. - - If efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000 - is specified, EFI_MEMORY_MORE_RELIABLE(0x10000) - attribute is added to range 0x100000000-0x180000000 and - 0x10a0000000-0x1120000000. - - If efi_fake_mem=8G@9G:0x40000 is specified, the - EFI_MEMORY_SP(0x40000) attribute is added to - range 0x240000000-0x43fffffff. - - Using this parameter you can do debugging of EFI memmap - related features. For example, you can do debugging of - Address Range Mirroring feature even if your box - doesn't support it, or mark specific memory as - "soft reserved". - efivar_ssdt= [EFI; X86] Name of an EFI variable that contains an SSDT that is to be dynamically loaded by Linux. If there are multiple variables with the same name but with different diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1d7122a1883e..0c31f65aeb1d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2038,26 +2038,6 @@ config EFI_MIXED If unsure, say N. -config EFI_FAKE_MEMMAP - bool "Enable EFI fake memory map" - depends on EFI - help - Saying Y here will enable "efi_fake_mem" boot option. By specifying - this parameter, you can add arbitrary attribute to specific memory - range by updating original (firmware provided) EFI memmap. This is - useful for debugging of EFI memmap related feature, e.g., Address - Range Mirroring feature. - -config EFI_MAX_FAKE_MEM - int "maximum allowable number of ranges in efi_fake_mem boot option" - depends on EFI_FAKE_MEMMAP - range 1 128 - default 8 - help - Maximum allowable number of ranges in efi_fake_mem boot option. - Ranges can be set up to this value using comma-separated list. - The default value is 8. - config EFI_RUNTIME_MAP bool "Export EFI runtime maps to sysfs" if EXPERT depends on EFI diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index dec961c6d16a..f4d82379bf44 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -119,13 +119,8 @@ char *skip_spaces(const char *str) #include "../../../../lib/ctype.c" #include "../../../../lib/cmdline.c" -enum parse_mode { - PARSE_MEMMAP, - PARSE_EFI, -}; - static int -parse_memmap(char *p, u64 *start, u64 *size, enum parse_mode mode) +parse_memmap(char *p, u64 *start, u64 *size) { char *oldp; @@ -148,29 +143,11 @@ parse_memmap(char *p, u64 *start, u64 *size, enum parse_mode mode) *start = memparse(p + 1, &p); return 0; case '@': - if (mode == PARSE_MEMMAP) { - /* - * memmap=nn@ss specifies usable region, should - * be skipped - */ - *size = 0; - } else { - u64 flags; - - /* - * efi_fake_mem=nn@ss:attr the attr specifies - * flags that might imply a soft-reservation. - */ - *start = memparse(p + 1, &p); - if (p && *p == ':') { - p++; - if (kstrtoull(p, 0, &flags) < 0) - *size = 0; - else if (flags & EFI_MEMORY_SP) - return 0; - } - *size = 0; - } + /* + * memmap=nn@ss specifies usable region, should + * be skipped + */ + *size = 0; fallthrough; default: /* @@ -185,7 +162,7 @@ parse_memmap(char *p, u64 *start, u64 *size, enum parse_mode mode) return -EINVAL; } -static void mem_avoid_memmap(enum parse_mode mode, char *str) +static void mem_avoid_memmap(char *str) { static int i; @@ -200,7 +177,7 @@ static void mem_avoid_memmap(enum parse_mode mode, char *str) if (k) *k++ = 0; - rc = parse_memmap(str, &start, &size, mode); + rc = parse_memmap(str, &start, &size); if (rc < 0) break; str = k; @@ -281,7 +258,7 @@ static void handle_mem_options(void) break; if (!strcmp(param, "memmap")) { - mem_avoid_memmap(PARSE_MEMMAP, val); + mem_avoid_memmap(val); } else if (IS_ENABLED(CONFIG_X86_64) && strstr(param, "hugepages")) { parse_gb_huge_pages(param, val); } else if (!strcmp(param, "mem")) { @@ -295,8 +272,6 @@ static void handle_mem_options(void) if (mem_size < mem_limit) mem_limit = mem_size; - } else if (!strcmp(param, "efi_fake_mem")) { - mem_avoid_memmap(PARSE_EFI, val); } } diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 481096177500..b3e4d999b913 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -384,23 +384,8 @@ static inline void efi_reserve_boot_services(void) } #endif /* CONFIG_EFI */ -#ifdef CONFIG_EFI_FAKE_MEMMAP -extern void __init efi_fake_memmap_early(void); -extern void __init efi_fake_memmap(void); -#else -static inline void efi_fake_memmap_early(void) -{ -} - -static inline void efi_fake_memmap(void) -{ -} -#endif - extern int __init efi_memmap_alloc(unsigned int num_entries, struct efi_memory_map_data *data); -extern void __efi_memmap_free(u64 phys, unsigned long size, - unsigned long flags); extern int __init efi_memmap_install(struct efi_memory_map_data *data); extern int __init efi_memmap_split_count(efi_memory_desc_t *md, diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 05c5aa951da7..282e8577ac7b 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -995,7 +995,6 @@ void __init setup_arch(char **cmdline_p) mem_encrypt_setup_arch(); cc_random_init(); - efi_fake_memmap(); efi_find_mirror(); efi_esrt_init(); efi_mokvar_table_init(); diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index f090ec972d7b..88a96816de9a 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -226,8 +226,6 @@ int __init efi_memblock_x86_reserve_range(void) if (add_efi_memmap || do_efi_soft_reserve()) do_add_efi_memmap(); - efi_fake_memmap_early(); - WARN(efi.memmap.desc_version != 1, "Unexpected EFI_MEMORY_DESCRIPTOR version %ld", efi.memmap.desc_version); diff --git a/arch/x86/platform/efi/fake_mem.c b/arch/x86/platform/efi/fake_mem.c deleted file mode 100644 index 41d57cad3d84..000000000000 --- a/arch/x86/platform/efi/fake_mem.c +++ /dev/null @@ -1,197 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * fake_mem.c - * - * Copyright (C) 2015 FUJITSU LIMITED - * Author: Taku Izumi <izumi.taku@jp.fujitsu.com> - * - * This code introduces new boot option named "efi_fake_mem" - * By specifying this parameter, you can add arbitrary attribute to - * specific memory range by updating original (firmware provided) EFI - * memmap. - */ - -#include <linux/kernel.h> -#include <linux/efi.h> -#include <linux/init.h> -#include <linux/memblock.h> -#include <linux/types.h> -#include <linux/sort.h> -#include <asm/e820/api.h> -#include <asm/efi.h> - -#define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKE_MEM - -static struct efi_mem_range efi_fake_mems[EFI_MAX_FAKEMEM]; -static int nr_fake_mem; - -static int __init cmp_fake_mem(const void *x1, const void *x2) -{ - const struct efi_mem_range *m1 = x1; - const struct efi_mem_range *m2 = x2; - - if (m1->range.start < m2->range.start) - return -1; - if (m1->range.start > m2->range.start) - return 1; - return 0; -} - -static void __init efi_fake_range(struct efi_mem_range *efi_range) -{ - struct efi_memory_map_data data = { 0 }; - int new_nr_map = efi.memmap.nr_map; - efi_memory_desc_t *md; - void *new_memmap; - - /* count up the number of EFI memory descriptor */ - for_each_efi_memory_desc(md) - new_nr_map += efi_memmap_split_count(md, &efi_range->range); - - /* allocate memory for new EFI memmap */ - if (efi_memmap_alloc(new_nr_map, &data) != 0) - return; - - /* create new EFI memmap */ - new_memmap = early_memremap(data.phys_map, data.size); - if (!new_memmap) { - __efi_memmap_free(data.phys_map, data.size, data.flags); - return; - } - - efi_memmap_insert(&efi.memmap, new_memmap, efi_range); - - /* swap into new EFI memmap */ - early_memunmap(new_memmap, data.size); - - efi_memmap_install(&data); -} - -void __init efi_fake_memmap(void) -{ - int i; - - if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem) - return; - - for (i = 0; i < nr_fake_mem; i++) - efi_fake_range(&efi_fake_mems[i]); - - /* print new EFI memmap */ - efi_print_memmap(); -} - -static int __init setup_fake_mem(char *p) -{ - u64 start = 0, mem_size = 0, attribute = 0; - int i; - - if (!p) - return -EINVAL; - - while (*p != '\0') { - mem_size = memparse(p, &p); - if (*p == '@') - start = memparse(p+1, &p); - else - break; - - if (*p == ':') - attribute = simple_strtoull(p+1, &p, 0); - else - break; - - if (nr_fake_mem >= EFI_MAX_FAKEMEM) - break; - - efi_fake_mems[nr_fake_mem].range.start = start; - efi_fake_mems[nr_fake_mem].range.end = start + mem_size - 1; - efi_fake_mems[nr_fake_mem].attribute = attribute; - nr_fake_mem++; - - if (*p == ',') - p++; - } - - sort(efi_fake_mems, nr_fake_mem, sizeof(struct efi_mem_range), - cmp_fake_mem, NULL); - - for (i = 0; i < nr_fake_mem; i++) - pr_info("efi_fake_mem: add attr=0x%016llx to [mem 0x%016llx-0x%016llx]", - efi_fake_mems[i].attribute, efi_fake_mems[i].range.start, - efi_fake_mems[i].range.end); - - return *p == '\0' ? 0 : -EINVAL; -} - -early_param("efi_fake_mem", setup_fake_mem); - -void __init efi_fake_memmap_early(void) -{ - int i; - - /* - * The late efi_fake_mem() call can handle all requests if - * EFI_MEMORY_SP support is disabled. - */ - if (!efi_soft_reserve_enabled()) - return; - - if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem) - return; - - /* - * Given that efi_fake_memmap() needs to perform memblock - * allocations it needs to run after e820__memblock_setup(). - * However, if efi_fake_mem specifies EFI_MEMORY_SP for a given - * address range that potentially needs to mark the memory as - * reserved prior to e820__memblock_setup(). Update e820 - * directly if EFI_MEMORY_SP is specified for an - * EFI_CONVENTIONAL_MEMORY descriptor. - */ - for (i = 0; i < nr_fake_mem; i++) { - struct efi_mem_range *mem = &efi_fake_mems[i]; - efi_memory_desc_t *md; - u64 m_start, m_end; - - if ((mem->attribute & EFI_MEMORY_SP) == 0) - continue; - - m_start = mem->range.start; - m_end = mem->range.end; - for_each_efi_memory_desc(md) { - u64 start, end, size; - - if (md->type != EFI_CONVENTIONAL_MEMORY) - continue; - - start = md->phys_addr; - end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1; - - if (m_start <= end && m_end >= start) - /* fake range overlaps descriptor */; - else - continue; - - /* - * Trim the boundary of the e820 update to the - * descriptor in case the fake range overlaps - * !EFI_CONVENTIONAL_MEMORY - */ - start = max(start, m_start); - end = min(end, m_end); - size = end - start + 1; - - if (end <= start) - continue; - - /* - * Ensure each efi_fake_mem instance results in - * a unique e820 resource - */ - e820__range_remove(start, size, E820_TYPE_RAM, 1); - e820__range_add(start, size, E820_TYPE_SOFT_RESERVED); - e820__update_table(e820_table); - } - } -} diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c index 6ed1935504b9..061b8ecc71a1 100644 --- a/arch/x86/platform/efi/memmap.c +++ b/arch/x86/platform/efi/memmap.c @@ -30,6 +30,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size) return PFN_PHYS(page_to_pfn(p)); } +static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags) { if (flags & EFI_MEMMAP_MEMBLOCK) { diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 1983fd3bf392..68df27bd71c9 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -781,7 +781,7 @@ static const char *cmdline_memmap_override; static efi_status_t parse_options(const char *cmdline) { static const char opts[][14] = { - "mem=", "memmap=", "efi_fake_mem=", "hugepages=" + "mem=", "memmap=", "hugepages=" }; for (int i = 0; i < ARRAY_SIZE(opts); i++) {