From patchwork Mon Apr 7 06:11:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 879094 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2511D224AF3 for ; Mon, 7 Apr 2025 06:11:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744006306; cv=none; b=X+xfiyIfeCa7sB8IPgd8Mn7804nA+yUHc733iosx2eR5oApzve6aNSqfdyyZCcRZM3Q1+4Qrve8/l7L9BN8Snp5KRw3WWPP79qSoXtl4uOY2C/J4S+VzEuTDO+Q/ErDWBNgFXyO7m7k/XjqiLxQPQ0Htna/fO2+Dw5ESRJNESLA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744006306; c=relaxed/simple; bh=29wEvP3mo2FgQbpZl2dIbWEAUdhjtBmuI+j9Jhz91AM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=BppilFInkYEdofsWi6mEsGd5lmPxSkLtUulkrd7VYHQlWTBf8uDyDNT4QvYV1OUFC7L3ylFRZxAk9UetMApMgZSkglHuFwsoeeWIsoKzFAkyKk4OqjI/cwC/yNFmbqT3WXgnZx0N/MXw1bJBwMxlEFxoGDflAPFhwe2fc51zeug= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xne6OdtT; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xne6OdtT" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-3912e4e2033so1757014f8f.0 for ; Sun, 06 Apr 2025 23:11:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744006302; x=1744611102; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BMxiBFcThX7tgGvKaVxziX0eJsMo+maZ3LTI6Ameq7o=; b=xne6OdtTKOwxe8Z43OxJooYgvXvMTJ56cz0wQCE1RsDOUgGR8Aznxn2AbDg9Pwkud1 MJSQlr/6DwkRW+ClNHsOM+9n5ebpeUYJDcGevz641xz5WRnlURwCq1tpN+o9rR7fYsj5 C9ZwAT7GPoWTehckoxP5aRe09tT2uD2EUtekDGLvf3HBpFyJQMpbiHgPk55L8xm0s37V GnFxX36/khd40A3xHSDI10+RDNL9Z/NlJEfbwPqoTgMyFFL3u9xt3POTu2V5fmfv7pma v59wL30jGqaaWpt2QyIrW/PMEgQhfvibX4tUb0MRTUBgL1pzTGe3l7RyDqG6sWeKdY0j G4xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744006302; x=1744611102; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BMxiBFcThX7tgGvKaVxziX0eJsMo+maZ3LTI6Ameq7o=; b=QgcRgEXOoqUGJa+jnQTgk01m4DSLOAJLtmVceJ/J+vbHCnufVya2n3WzjEYQu2d5DB mVDKJ5MadUw2bqXEeyRvqgfH4y8G/BtT+5poJWzElkDuzzVmiDch/N++eKvt2I9NqR8k uxhehIfQVvmxaFmrgipyF+jeeJKoBUUGQWp0DwGadWSncLBnGukUlDCCR++ptT7DCpp4 WovDpq8s6O5PUbqIYkQPzgJYeC8JoJxH0+WhGYaZdVujcLLzYzmqJwJWeXbnQsS+a59c lBL5tARkYOiFSfJegxjrbbX7W8yz+vekWH/yREV/VmYppABwvk1Ehd74/CoiRF9wxvCA dZ/Q== X-Forwarded-Encrypted: i=1; AJvYcCWrEbydK3V05FS0jZe3oDIAj4ZOT8uNwauoDs+ysHSbLuiqNTXBeYQHQgwjTI1X3yz8o7xE0ih24zQ=@vger.kernel.org X-Gm-Message-State: AOJu0YymH5gPGddyaouhuJfJiLv0vGIiG07F4tqeiQL3avOwVVXBiwZ8 tFrxpJals5TLmOkpttaVITd6h2jygBOjHfFRfe6HxDykptnYv3HEaT+Vg1yoIpNhR9FLZw== X-Google-Smtp-Source: AGHT+IHRTs8Q23SbKVrOZW+vwKXToI+ggMkxg+UmNCECbwh8wFZga2Ek6mwzvgSVHQQ/vis+fOpRvZEU X-Received: from wrbcg11.prod.google.com ([2002:a5d:5ccb:0:b0:39a:bfe3:f904]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:584c:0:b0:390:fb37:1ca with SMTP id ffacd0b85a97d-39cba93cf5dmr9460146f8f.53.1744006302441; Sun, 06 Apr 2025 23:11:42 -0700 (PDT) Date: Mon, 7 Apr 2025 08:11:34 +0200 In-Reply-To: <20250407061132.69315-4-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250407061132.69315-4-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=16201; i=ardb@kernel.org; h=from:subject; bh=bFcOQFipnneenYlvZh5Ml236kuSq+b0rnH02R3Sld2s=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf1zznSXw4sX1yy+ezVQPl6l9f70Lz0PndZqxUhUbWm7G SKR8Xl7RykLgxgHg6yYIovA7L/vdp6eKFXrPEsWZg4rE8gQBi5OAZhIyzxGhp8miokTU9iMDj/+ tIXzxlXJv2HXH2Ybzd4QNNk/g6Mk5TXD/yyzO5NlVCqmcJzYu+BYmPv2G9Wnr+evNauO6pmizZP qxA8A X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250407061132.69315-5-ardb+git@google.com> Subject: [PATCH v2 1/2] x86/boot: Move early kernel mapping code into startup/ From: Ard Biesheuvel To: mingo@kernel.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org, Ard Biesheuvel From: Ard Biesheuvel The startup code that constructs the kernel virtual mapping runs from the 1:1 mapping of memory itself, and therefore, cannot use absolute symbol references. Move this code into a separate source file under arch/x86/boot/startup/ where all such code will be kept from now on. Since all code here is constructed in a manner that ensures that it tolerates running from the 1:1 mapping of memory, any uses of the RIP_REL_REF() macro can be dropped, along with __head annotations for placing this code in a dedicated startup section. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/startup/Makefile | 2 +- arch/x86/boot/startup/map_kernel.c | 215 ++++++++++++++++++++ arch/x86/kernel/head64.c | 211 +------------------ 3 files changed, 217 insertions(+), 211 deletions(-) diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile index 34b324cbd5a4..01423063fec2 100644 --- a/arch/x86/boot/startup/Makefile +++ b/arch/x86/boot/startup/Makefile @@ -15,7 +15,7 @@ KMSAN_SANITIZE := n UBSAN_SANITIZE := n KCOV_INSTRUMENT := n -obj-$(CONFIG_X86_64) += gdt_idt.o +obj-$(CONFIG_X86_64) += gdt_idt.o map_kernel.o lib-$(CONFIG_X86_64) += la57toggle.o lib-$(CONFIG_EFI_MIXED) += efi-mixed.o diff --git a/arch/x86/boot/startup/map_kernel.c b/arch/x86/boot/startup/map_kernel.c new file mode 100644 index 000000000000..1cf57c03c319 --- /dev/null +++ b/arch/x86/boot/startup/map_kernel.c @@ -0,0 +1,215 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include + +#include +#include +#include + +extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD]; +extern unsigned int next_early_pgt; + +static inline bool check_la57_support(void) +{ + if (!IS_ENABLED(CONFIG_X86_5LEVEL)) + return false; + + /* + * 5-level paging is detected and enabled at kernel decompression + * stage. Only check if it has been enabled there. + */ + if (!(native_read_cr4() & X86_CR4_LA57)) + return false; + + __pgtable_l5_enabled = 1; + pgdir_shift = 48; + ptrs_per_p4d = 512; + page_offset_base = __PAGE_OFFSET_BASE_L5; + vmalloc_base = __VMALLOC_BASE_L5; + vmemmap_base = __VMEMMAP_BASE_L5; + + return true; +} + +static unsigned long sme_postprocess_startup(struct boot_params *bp, + pmdval_t *pmd, + unsigned long p2v_offset) +{ + unsigned long paddr, paddr_end; + int i; + + /* Encrypt the kernel and related (if SME is active) */ + sme_encrypt_kernel(bp); + + /* + * Clear the memory encryption mask from the .bss..decrypted section. + * The bss section will be memset to zero later in the initialization so + * there is no need to zero it after changing the memory encryption + * attribute. + */ + if (sme_get_me_mask()) { + paddr = (unsigned long)__start_bss_decrypted; + paddr_end = (unsigned long)__end_bss_decrypted; + + for (; paddr < paddr_end; paddr += PMD_SIZE) { + /* + * On SNP, transition the page to shared in the RMP table so that + * it is consistent with the page table attribute change. + * + * __start_bss_decrypted has a virtual address in the high range + * mapping (kernel .text). PVALIDATE, by way of + * early_snp_set_memory_shared(), requires a valid virtual + * address but the kernel is currently running off of the identity + * mapping so use the PA to get a *currently* valid virtual address. + */ + early_snp_set_memory_shared(paddr, paddr, PTRS_PER_PMD); + + i = pmd_index(paddr - p2v_offset); + pmd[i] -= sme_get_me_mask(); + } + } + + /* + * Return the SME encryption mask (if SME is active) to be used as a + * modifier for the initial pgdir entry programmed into CR3. + */ + return sme_get_me_mask(); +} + +unsigned long __init __startup_64(unsigned long p2v_offset, + struct boot_params *bp) +{ + pmd_t (*early_pgts)[PTRS_PER_PMD] = early_dynamic_pgts; + unsigned long physaddr = (unsigned long)_text; + unsigned long va_text, va_end; + unsigned long pgtable_flags; + unsigned long load_delta; + pgdval_t *pgd; + p4dval_t *p4d; + pudval_t *pud; + pmdval_t *pmd, pmd_entry; + bool la57; + int i; + + la57 = check_la57_support(); + + /* Is the address too large? */ + if (physaddr >> MAX_PHYSMEM_BITS) + for (;;); + + /* + * Compute the delta between the address I am compiled to run at + * and the address I am actually running at. + */ + phys_base = load_delta = __START_KERNEL_map + p2v_offset; + + /* Is the address not 2M aligned? */ + if (load_delta & ~PMD_MASK) + for (;;); + + va_text = physaddr - p2v_offset; + va_end = (unsigned long)_end - p2v_offset; + + /* Include the SME encryption mask in the fixup value */ + load_delta += sme_get_me_mask(); + + /* Fixup the physical addresses in the page table */ + + pgd = &early_top_pgt[0].pgd; + pgd[pgd_index(__START_KERNEL_map)] += load_delta; + + if (IS_ENABLED(CONFIG_X86_5LEVEL) && la57) { + p4d = (p4dval_t *)level4_kernel_pgt; + p4d[MAX_PTRS_PER_P4D - 1] += load_delta; + + pgd[pgd_index(__START_KERNEL_map)] = (pgdval_t)p4d | _PAGE_TABLE; + } + + level3_kernel_pgt[PTRS_PER_PUD - 2].pud += load_delta; + level3_kernel_pgt[PTRS_PER_PUD - 1].pud += load_delta; + + for (i = FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--) + level2_fixmap_pgt[i].pmd += load_delta; + + /* + * Set up the identity mapping for the switchover. These + * entries should *NOT* have the global bit set! This also + * creates a bunch of nonsense entries but that is fine -- + * it avoids problems around wraparound. + */ + + pud = &early_pgts[0]->pmd; + pmd = &early_pgts[1]->pmd; + next_early_pgt = 2; + + pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask(); + + if (la57) { + p4d = &early_pgts[next_early_pgt++]->pmd; + + i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; + pgd[i + 0] = (pgdval_t)p4d + pgtable_flags; + pgd[i + 1] = (pgdval_t)p4d + pgtable_flags; + + i = physaddr >> P4D_SHIFT; + p4d[(i + 0) % PTRS_PER_P4D] = (pgdval_t)pud + pgtable_flags; + p4d[(i + 1) % PTRS_PER_P4D] = (pgdval_t)pud + pgtable_flags; + } else { + i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; + pgd[i + 0] = (pgdval_t)pud + pgtable_flags; + pgd[i + 1] = (pgdval_t)pud + pgtable_flags; + } + + i = physaddr >> PUD_SHIFT; + pud[(i + 0) % PTRS_PER_PUD] = (pudval_t)pmd + pgtable_flags; + pud[(i + 1) % PTRS_PER_PUD] = (pudval_t)pmd + pgtable_flags; + + pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL; + /* Filter out unsupported __PAGE_KERNEL_* bits: */ + pmd_entry &= __supported_pte_mask; + pmd_entry += sme_get_me_mask(); + pmd_entry += physaddr; + + for (i = 0; i < DIV_ROUND_UP(va_end - va_text, PMD_SIZE); i++) { + int idx = i + (physaddr >> PMD_SHIFT); + + pmd[idx % PTRS_PER_PMD] = pmd_entry + i * PMD_SIZE; + } + + /* + * Fixup the kernel text+data virtual addresses. Note that + * we might write invalid pmds, when the kernel is relocated + * cleanup_highmap() fixes this up along with the mappings + * beyond _end. + * + * Only the region occupied by the kernel image has so far + * been checked against the table of usable memory regions + * provided by the firmware, so invalidate pages outside that + * region. A page table entry that maps to a reserved area of + * memory would allow processor speculation into that area, + * and on some hardware (particularly the UV platform) even + * speculative access to some reserved areas is caught as an + * error, causing the BIOS to halt the system. + */ + + pmd = &level2_kernel_pgt[0].pmd; + + /* invalidate pages before the kernel image */ + for (i = 0; i < pmd_index(va_text); i++) + pmd[i] &= ~_PAGE_PRESENT; + + /* fixup pages that are part of the kernel image */ + for (; i <= pmd_index(va_end); i++) + if (pmd[i] & _PAGE_PRESENT) + pmd[i] += load_delta; + + /* invalidate pages after the kernel image */ + for (; i < PTRS_PER_PMD; i++) + pmd[i] &= ~_PAGE_PRESENT; + + return sme_postprocess_startup(bp, pmd, p2v_offset); +} diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 5b993b545c7e..6b68a206fa7f 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -47,7 +47,7 @@ * Manage page tables very early on. */ extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD]; -static unsigned int __initdata next_early_pgt; +unsigned int __initdata next_early_pgt; pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX); #ifdef CONFIG_X86_5LEVEL @@ -67,215 +67,6 @@ unsigned long vmemmap_base __ro_after_init = __VMEMMAP_BASE_L4; EXPORT_SYMBOL(vmemmap_base); #endif -static inline bool check_la57_support(void) -{ - if (!IS_ENABLED(CONFIG_X86_5LEVEL)) - return false; - - /* - * 5-level paging is detected and enabled at kernel decompression - * stage. Only check if it has been enabled there. - */ - if (!(native_read_cr4() & X86_CR4_LA57)) - return false; - - RIP_REL_REF(__pgtable_l5_enabled) = 1; - RIP_REL_REF(pgdir_shift) = 48; - RIP_REL_REF(ptrs_per_p4d) = 512; - RIP_REL_REF(page_offset_base) = __PAGE_OFFSET_BASE_L5; - RIP_REL_REF(vmalloc_base) = __VMALLOC_BASE_L5; - RIP_REL_REF(vmemmap_base) = __VMEMMAP_BASE_L5; - - return true; -} - -static unsigned long __head sme_postprocess_startup(struct boot_params *bp, - pmdval_t *pmd, - unsigned long p2v_offset) -{ - unsigned long paddr, paddr_end; - int i; - - /* Encrypt the kernel and related (if SME is active) */ - sme_encrypt_kernel(bp); - - /* - * Clear the memory encryption mask from the .bss..decrypted section. - * The bss section will be memset to zero later in the initialization so - * there is no need to zero it after changing the memory encryption - * attribute. - */ - if (sme_get_me_mask()) { - paddr = (unsigned long)&RIP_REL_REF(__start_bss_decrypted); - paddr_end = (unsigned long)&RIP_REL_REF(__end_bss_decrypted); - - for (; paddr < paddr_end; paddr += PMD_SIZE) { - /* - * On SNP, transition the page to shared in the RMP table so that - * it is consistent with the page table attribute change. - * - * __start_bss_decrypted has a virtual address in the high range - * mapping (kernel .text). PVALIDATE, by way of - * early_snp_set_memory_shared(), requires a valid virtual - * address but the kernel is currently running off of the identity - * mapping so use the PA to get a *currently* valid virtual address. - */ - early_snp_set_memory_shared(paddr, paddr, PTRS_PER_PMD); - - i = pmd_index(paddr - p2v_offset); - pmd[i] -= sme_get_me_mask(); - } - } - - /* - * Return the SME encryption mask (if SME is active) to be used as a - * modifier for the initial pgdir entry programmed into CR3. - */ - return sme_get_me_mask(); -} - -/* Code in __startup_64() can be relocated during execution, but the compiler - * doesn't have to generate PC-relative relocations when accessing globals from - * that function. Clang actually does not generate them, which leads to - * boot-time crashes. To work around this problem, every global pointer must - * be accessed using RIP_REL_REF(). Kernel virtual addresses can be determined - * by subtracting p2v_offset from the RIP-relative address. - */ -unsigned long __head __startup_64(unsigned long p2v_offset, - struct boot_params *bp) -{ - pmd_t (*early_pgts)[PTRS_PER_PMD] = RIP_REL_REF(early_dynamic_pgts); - unsigned long physaddr = (unsigned long)&RIP_REL_REF(_text); - unsigned long va_text, va_end; - unsigned long pgtable_flags; - unsigned long load_delta; - pgdval_t *pgd; - p4dval_t *p4d; - pudval_t *pud; - pmdval_t *pmd, pmd_entry; - bool la57; - int i; - - la57 = check_la57_support(); - - /* Is the address too large? */ - if (physaddr >> MAX_PHYSMEM_BITS) - for (;;); - - /* - * Compute the delta between the address I am compiled to run at - * and the address I am actually running at. - */ - load_delta = __START_KERNEL_map + p2v_offset; - RIP_REL_REF(phys_base) = load_delta; - - /* Is the address not 2M aligned? */ - if (load_delta & ~PMD_MASK) - for (;;); - - va_text = physaddr - p2v_offset; - va_end = (unsigned long)&RIP_REL_REF(_end) - p2v_offset; - - /* Include the SME encryption mask in the fixup value */ - load_delta += sme_get_me_mask(); - - /* Fixup the physical addresses in the page table */ - - pgd = &RIP_REL_REF(early_top_pgt)->pgd; - pgd[pgd_index(__START_KERNEL_map)] += load_delta; - - if (IS_ENABLED(CONFIG_X86_5LEVEL) && la57) { - p4d = (p4dval_t *)&RIP_REL_REF(level4_kernel_pgt); - p4d[MAX_PTRS_PER_P4D - 1] += load_delta; - - pgd[pgd_index(__START_KERNEL_map)] = (pgdval_t)p4d | _PAGE_TABLE; - } - - RIP_REL_REF(level3_kernel_pgt)[PTRS_PER_PUD - 2].pud += load_delta; - RIP_REL_REF(level3_kernel_pgt)[PTRS_PER_PUD - 1].pud += load_delta; - - for (i = FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--) - RIP_REL_REF(level2_fixmap_pgt)[i].pmd += load_delta; - - /* - * Set up the identity mapping for the switchover. These - * entries should *NOT* have the global bit set! This also - * creates a bunch of nonsense entries but that is fine -- - * it avoids problems around wraparound. - */ - - pud = &early_pgts[0]->pmd; - pmd = &early_pgts[1]->pmd; - RIP_REL_REF(next_early_pgt) = 2; - - pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask(); - - if (la57) { - p4d = &early_pgts[RIP_REL_REF(next_early_pgt)++]->pmd; - - i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; - pgd[i + 0] = (pgdval_t)p4d + pgtable_flags; - pgd[i + 1] = (pgdval_t)p4d + pgtable_flags; - - i = physaddr >> P4D_SHIFT; - p4d[(i + 0) % PTRS_PER_P4D] = (pgdval_t)pud + pgtable_flags; - p4d[(i + 1) % PTRS_PER_P4D] = (pgdval_t)pud + pgtable_flags; - } else { - i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; - pgd[i + 0] = (pgdval_t)pud + pgtable_flags; - pgd[i + 1] = (pgdval_t)pud + pgtable_flags; - } - - i = physaddr >> PUD_SHIFT; - pud[(i + 0) % PTRS_PER_PUD] = (pudval_t)pmd + pgtable_flags; - pud[(i + 1) % PTRS_PER_PUD] = (pudval_t)pmd + pgtable_flags; - - pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL; - /* Filter out unsupported __PAGE_KERNEL_* bits: */ - pmd_entry &= RIP_REL_REF(__supported_pte_mask); - pmd_entry += sme_get_me_mask(); - pmd_entry += physaddr; - - for (i = 0; i < DIV_ROUND_UP(va_end - va_text, PMD_SIZE); i++) { - int idx = i + (physaddr >> PMD_SHIFT); - - pmd[idx % PTRS_PER_PMD] = pmd_entry + i * PMD_SIZE; - } - - /* - * Fixup the kernel text+data virtual addresses. Note that - * we might write invalid pmds, when the kernel is relocated - * cleanup_highmap() fixes this up along with the mappings - * beyond _end. - * - * Only the region occupied by the kernel image has so far - * been checked against the table of usable memory regions - * provided by the firmware, so invalidate pages outside that - * region. A page table entry that maps to a reserved area of - * memory would allow processor speculation into that area, - * and on some hardware (particularly the UV platform) even - * speculative access to some reserved areas is caught as an - * error, causing the BIOS to halt the system. - */ - - pmd = &RIP_REL_REF(level2_kernel_pgt)->pmd; - - /* invalidate pages before the kernel image */ - for (i = 0; i < pmd_index(va_text); i++) - pmd[i] &= ~_PAGE_PRESENT; - - /* fixup pages that are part of the kernel image */ - for (; i <= pmd_index(va_end); i++) - if (pmd[i] & _PAGE_PRESENT) - pmd[i] += load_delta; - - /* invalidate pages after the kernel image */ - for (; i < PTRS_PER_PMD; i++) - pmd[i] &= ~_PAGE_PRESENT; - - return sme_postprocess_startup(bp, pmd, p2v_offset); -} - /* Wipe all early page tables except for the kernel symbol map */ static void __init reset_early_page_tables(void) { From patchwork Mon Apr 7 06:11:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 878776 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5834122538F for ; Mon, 7 Apr 2025 06:11:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744006308; cv=none; b=WsiPotx5PMCEyuvABL6LL9520PMk4hImTNfsu9cCDqjDH4lcV9B0hUXPlIyzqkFQbXdeazxSyWUBFaj7mkr+8IJtcGWQ5V2CjfLn8dO92BE6+7TYJ8FRl4bpogxvFXAHR+o5iXYzRuFJOelEdI4SJzfcreVG61rVA+fvfo+tHfg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744006308; c=relaxed/simple; bh=a4UOxoM75WIQaqNEDX2cAYA7eHrzdSLznlT5GRnSf4E=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XJBw9q7E9uLgZMb5BQ8BP8xAzg1C6Tcnq1VhIYtkn29KYRXz1cBbBvjJiifa1AftoSTfTVGws+0lF9PzuhFMPa2xZsZFWfMsLVObwYcgBF7yvYEoMezjZDLpOkhe1scZ/4zYr2ZVKPZ+OPnUeYEje802Gs71BJ4wCFwoKE/inAs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=YNfNlIkm; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YNfNlIkm" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-391492acb59so2306332f8f.3 for ; Sun, 06 Apr 2025 23:11:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744006304; x=1744611104; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pbfnJnCDmQrrChz5QAx7ox2nJHFPzvTjb5gL5epTEnc=; b=YNfNlIkmvHJdd6QgTmmX1EtX2Otd7iLf66SpmKMBmTNsh+dorMixFXRS6sZ0VuQlsa Ik09F1gLAF3pKXF57AEBXSHfWSDdGIayzt4KHIj7PW205d5AsHpEVRb3HfsUuPwJfipc 901Z3jUAKCobXz0brKae7/oqbREuBbumZK15Ws4EC8k28ttaY2zh0KdaHmFMQulS+Lo/ QSzo8NcdXqHgx7ECaJFb+lTPnkRiLoeTNb1ydKcns+b8SKsa0GF3x61A3z3zwPhI0cAd RIrsNfIaBrJwnBsRt1ri9VfHzN/6Pzykwk3Xjny8246tWnitePq63vzDZaeQbSQyVSpn htUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744006304; x=1744611104; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pbfnJnCDmQrrChz5QAx7ox2nJHFPzvTjb5gL5epTEnc=; b=nfxDGTuANP8/kvMRSqyrq5t1O/ez7Lp/jMDAsTc3jEHvR6cusa6CK4kOyWImXRn7Hn v8h2eFcf+BtNuV6k/6HzXFUtgM8T+3TWR1KbsejGvNKgs9ech8qzPRR6v1IcokXtbkHT dr2lFFoIwjtln7REXpyBpD6GEX1amwbbNK96kqt3yBylYHcfAZwsRWtEiojR+rfhvxkC eZq1pJ49BrzmcQKyDizYdkngrczQ76sLBv0lTAuvtIlRyEQSHSU+elQGxCFVv+DUqdqH 70VD3clyPxbpdOwpdOvnpgBkJtPWeyGnjH6mCMEDcPyOJUZKDGSp2qh77AiuOyZAWxdQ SOsQ== X-Forwarded-Encrypted: i=1; AJvYcCXENXfbqUVO/H0sAokbodoWA+AbKlkBdhV/iUYT8RGBI8O62Elgty4OAD2rDRfV9JQT/LmAOPVzdvM=@vger.kernel.org X-Gm-Message-State: AOJu0Ywdu6yTHW2gBv7TuAZx5EGjLeGywasUOULD59TkdCoPGFcoNtP6 tk6yQyGnqUUGKRdboGl3c/iRNz4fKFYJAPxI2czoXU4zFusS+QP5R4cun9YjAG8uR8PGeQ== X-Google-Smtp-Source: AGHT+IH0iTHKoSnExWD38wMno/X063/6ygT8RIzLClPsv9h9+b1uVjMhb1uquURy2RnxO8O0FC8QoAQi X-Received: from wmsp14.prod.google.com ([2002:a05:600c:1d8e:b0:43d:b71:a576]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:5f96:0:b0:39c:cc7:3c62 with SMTP id ffacd0b85a97d-39d147577famr8969577f8f.51.1744006304822; Sun, 06 Apr 2025 23:11:44 -0700 (PDT) Date: Mon, 7 Apr 2025 08:11:35 +0200 In-Reply-To: <20250407061132.69315-4-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250407061132.69315-4-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=9644; i=ardb@kernel.org; h=from:subject; bh=rLLOUVelgqK5VOtLrSMJgnsMJeUEIBCp5YJLNfqkZe8=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf1zzgyGh6aevt9TThfd95I22LzC+W9FWXbBjpSLrzwLf MUD5TQ6SlkYxDgYZMUUWQRm/3238/REqVrnWbIwc1iZwIZwcQrAREpbGBnWhGRWf3/of0v7Br/A bkmtE5eDrupM7Ju2ftICe+kHDTpyDP+zRdL5Tl6ZYCVQ+73RaIUzy5f527j2rNPe3lfBeW/byZV sAA== X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250407061132.69315-6-ardb+git@google.com> Subject: [PATCH v2 2/2] x86/boot: Move early SME init code into startup/ From: Ard Biesheuvel To: mingo@kernel.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org, Ard Biesheuvel From: Ard Biesheuvel Move the SME initialization code, which runs from the 1:1 mapping of memory as it operates on the kernel virtual mapping, into the new sub-directory arch/x86/boot/startup/ where all startup code will reside that needs to tolerate executing from the 1:1 mapping. This allows RIP_REL_REF() macro invocations and __head annotations to be dropped. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/startup/Makefile | 1 + arch/x86/{mm/mem_encrypt_identity.c => boot/startup/sme.c} | 45 +++++++++----------- arch/x86/include/asm/mem_encrypt.h | 2 +- arch/x86/mm/Makefile | 6 --- 4 files changed, 23 insertions(+), 31 deletions(-) diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile index 01423063fec2..480c2d2063a0 100644 --- a/arch/x86/boot/startup/Makefile +++ b/arch/x86/boot/startup/Makefile @@ -16,6 +16,7 @@ UBSAN_SANITIZE := n KCOV_INSTRUMENT := n obj-$(CONFIG_X86_64) += gdt_idt.o map_kernel.o +obj-$(CONFIG_AMD_MEM_ENCRYPT) += sme.o lib-$(CONFIG_X86_64) += la57toggle.o lib-$(CONFIG_EFI_MIXED) += efi-mixed.o diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/boot/startup/sme.c similarity index 92% rename from arch/x86/mm/mem_encrypt_identity.c rename to arch/x86/boot/startup/sme.c index 5eecdd92da10..85bd39652535 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/boot/startup/sme.c @@ -45,8 +45,6 @@ #include #include -#include "mm_internal.h" - #define PGD_FLAGS _KERNPG_TABLE_NOENC #define P4D_FLAGS _KERNPG_TABLE_NOENC #define PUD_FLAGS _KERNPG_TABLE_NOENC @@ -93,7 +91,7 @@ struct sme_populate_pgd_data { */ static char sme_workarea[2 * PMD_SIZE] __section(".init.scratch"); -static void __head sme_clear_pgd(struct sme_populate_pgd_data *ppd) +static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd) { unsigned long pgd_start, pgd_end, pgd_size; pgd_t *pgd_p; @@ -108,7 +106,7 @@ static void __head sme_clear_pgd(struct sme_populate_pgd_data *ppd) memset(pgd_p, 0, pgd_size); } -static pud_t __head *sme_prepare_pgd(struct sme_populate_pgd_data *ppd) +static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd) { pgd_t *pgd; p4d_t *p4d; @@ -145,7 +143,7 @@ static pud_t __head *sme_prepare_pgd(struct sme_populate_pgd_data *ppd) return pud; } -static void __head sme_populate_pgd_large(struct sme_populate_pgd_data *ppd) +static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd) { pud_t *pud; pmd_t *pmd; @@ -161,7 +159,7 @@ static void __head sme_populate_pgd_large(struct sme_populate_pgd_data *ppd) set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags)); } -static void __head sme_populate_pgd(struct sme_populate_pgd_data *ppd) +static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd) { pud_t *pud; pmd_t *pmd; @@ -187,7 +185,7 @@ static void __head sme_populate_pgd(struct sme_populate_pgd_data *ppd) set_pte(pte, __pte(ppd->paddr | ppd->pte_flags)); } -static void __head __sme_map_range_pmd(struct sme_populate_pgd_data *ppd) +static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd) { while (ppd->vaddr < ppd->vaddr_end) { sme_populate_pgd_large(ppd); @@ -197,7 +195,7 @@ static void __head __sme_map_range_pmd(struct sme_populate_pgd_data *ppd) } } -static void __head __sme_map_range_pte(struct sme_populate_pgd_data *ppd) +static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd) { while (ppd->vaddr < ppd->vaddr_end) { sme_populate_pgd(ppd); @@ -207,7 +205,7 @@ static void __head __sme_map_range_pte(struct sme_populate_pgd_data *ppd) } } -static void __head __sme_map_range(struct sme_populate_pgd_data *ppd, +static void __init __sme_map_range(struct sme_populate_pgd_data *ppd, pmdval_t pmd_flags, pteval_t pte_flags) { unsigned long vaddr_end; @@ -231,22 +229,22 @@ static void __head __sme_map_range(struct sme_populate_pgd_data *ppd, __sme_map_range_pte(ppd); } -static void __head sme_map_range_encrypted(struct sme_populate_pgd_data *ppd) +static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *ppd) { __sme_map_range(ppd, PMD_FLAGS_ENC, PTE_FLAGS_ENC); } -static void __head sme_map_range_decrypted(struct sme_populate_pgd_data *ppd) +static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *ppd) { __sme_map_range(ppd, PMD_FLAGS_DEC, PTE_FLAGS_DEC); } -static void __head sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd) +static void __init sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd) { __sme_map_range(ppd, PMD_FLAGS_DEC_WP, PTE_FLAGS_DEC_WP); } -static unsigned long __head sme_pgtable_calc(unsigned long len) +static unsigned long __init sme_pgtable_calc(unsigned long len) { unsigned long entries = 0, tables = 0; @@ -283,7 +281,7 @@ static unsigned long __head sme_pgtable_calc(unsigned long len) return entries + tables; } -void __head sme_encrypt_kernel(struct boot_params *bp) +void __init sme_encrypt_kernel(struct boot_params *bp) { unsigned long workarea_start, workarea_end, workarea_len; unsigned long execute_start, execute_end, execute_len; @@ -299,8 +297,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp) * instrumentation or checking boot_cpu_data in the cc_platform_has() * function. */ - if (!sme_get_me_mask() || - RIP_REL_REF(sev_status) & MSR_AMD64_SEV_ENABLED) + if (!sme_get_me_mask() || sev_status & MSR_AMD64_SEV_ENABLED) return; /* @@ -318,8 +315,8 @@ void __head sme_encrypt_kernel(struct boot_params *bp) * memory from being cached. */ - kernel_start = (unsigned long)RIP_REL_REF(_text); - kernel_end = ALIGN((unsigned long)RIP_REL_REF(_end), PMD_SIZE); + kernel_start = (unsigned long)_text; + kernel_end = ALIGN((unsigned long)_end, PMD_SIZE); kernel_len = kernel_end - kernel_start; initrd_start = 0; @@ -345,7 +342,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp) * pagetable structures for the encryption of the kernel * pagetable structures for workarea (in case not currently mapped) */ - execute_start = workarea_start = (unsigned long)RIP_REL_REF(sme_workarea); + execute_start = workarea_start = (unsigned long)sme_workarea; execute_end = execute_start + (PAGE_SIZE * 2) + PMD_SIZE; execute_len = execute_end - execute_start; @@ -488,7 +485,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp) native_write_cr3(__native_read_cr3()); } -void __head sme_enable(struct boot_params *bp) +void __init sme_enable(struct boot_params *bp) { unsigned int eax, ebx, ecx, edx; unsigned long feature_mask; @@ -526,7 +523,7 @@ void __head sme_enable(struct boot_params *bp) me_mask = 1UL << (ebx & 0x3f); /* Check the SEV MSR whether SEV or SME is enabled */ - RIP_REL_REF(sev_status) = msr = __rdmsr(MSR_AMD64_SEV); + sev_status = msr = __rdmsr(MSR_AMD64_SEV); feature_mask = (msr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT; /* @@ -562,8 +559,8 @@ void __head sme_enable(struct boot_params *bp) return; } - RIP_REL_REF(sme_me_mask) = me_mask; - RIP_REL_REF(physical_mask) &= ~me_mask; - RIP_REL_REF(cc_vendor) = CC_VENDOR_AMD; + sme_me_mask = me_mask; + physical_mask &= ~me_mask; + cc_vendor = CC_VENDOR_AMD; cc_set_mask(me_mask); } diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h index 1530ee301dfe..ea6494628cb0 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -61,7 +61,7 @@ void __init sev_es_init_vc_handling(void); static inline u64 sme_get_me_mask(void) { - return RIP_REL_REF(sme_me_mask); + return sme_me_mask; } #define __bss_decrypted __section(".bss..decrypted") diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 32035d5be5a0..3faa60f13a61 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -3,12 +3,10 @@ KCOV_INSTRUMENT_tlb.o := n KCOV_INSTRUMENT_mem_encrypt.o := n KCOV_INSTRUMENT_mem_encrypt_amd.o := n -KCOV_INSTRUMENT_mem_encrypt_identity.o := n KCOV_INSTRUMENT_pgprot.o := n KASAN_SANITIZE_mem_encrypt.o := n KASAN_SANITIZE_mem_encrypt_amd.o := n -KASAN_SANITIZE_mem_encrypt_identity.o := n KASAN_SANITIZE_pgprot.o := n # Disable KCSAN entirely, because otherwise we get warnings that some functions @@ -16,12 +14,10 @@ KASAN_SANITIZE_pgprot.o := n KCSAN_SANITIZE := n # Avoid recursion by not calling KMSAN hooks for CEA code. KMSAN_SANITIZE_cpu_entry_area.o := n -KMSAN_SANITIZE_mem_encrypt_identity.o := n ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_mem_encrypt.o = -pg CFLAGS_REMOVE_mem_encrypt_amd.o = -pg -CFLAGS_REMOVE_mem_encrypt_identity.o = -pg CFLAGS_REMOVE_pgprot.o = -pg endif @@ -32,7 +28,6 @@ obj-y += pat/ # Make sure __phys_addr has no stackprotector CFLAGS_physaddr.o := -fno-stack-protector -CFLAGS_mem_encrypt_identity.o := -fno-stack-protector CFLAGS_fault.o := -I $(src)/../include/asm/trace @@ -63,5 +58,4 @@ obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION) += pti.o obj-$(CONFIG_X86_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_amd.o -obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o