From patchwork Tue Jan 11 11:33:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 532376 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F310C433F5 for ; Tue, 11 Jan 2022 11:33:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349528AbiAKLdO (ORCPT ); Tue, 11 Jan 2022 06:33:14 -0500 Received: from mga03.intel.com ([134.134.136.65]:45407 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349517AbiAKLdN (ORCPT ); Tue, 11 Jan 2022 06:33:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900793; x=1673436793; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eAlNpFgI2dj8DL18robIsH5Fta03JBWQ7Hr8f4oKmC0=; b=GgvM2vC8JOcXzhsSPBh4hKwPoPykEoWj9VD05I+7KvTc3PYRoPgkvn+I e+8CbiQTUdfClAmXH5wktn9jqxuRhbwOJ1EIhSicGiSf4tXadF4snPo3b A1m28YbYNkTmZvNXzwx5IwQqtFNuQi3IkWUh/itFgXQoGgKTj4csNvIAv sxaLK/c+gHzZgFqP2TI1hfu8Yt+RBK4/iAbLD0dibaouc/jrt/VvoQKa1 X/Qgno9ocZcmgEGDHKpmKNlUdEk0W/s5FGR37TQBFeWXlLJuO8IwjqwCd fANotLjufnHEncI15+5Fr8Z6VbiZUQeA9Uz08/A7V352h6hzEXv8m3YVh Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="243419234" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="243419234" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="515063272" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga007.jf.intel.com with ESMTP; 11 Jan 2022 03:33:08 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 20859125; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 1/7] mm: Add support for unaccepted memory Date: Tue, 11 Jan 2022 14:33:08 +0300 Message-Id: <20220111113314.27173-2-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org UEFI Specification version 2.9 introduces the concept of memory acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, requiring memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific for the Virtual Machine platform. Accepting memory is costly and it makes VMM allocate memory for the accepted guest physical address range. It's better to postpone memory acceptance until memory is needed. It lowers boot time and reduces memory overhead. Support of such memory requires a few changes in core-mm code: - memblock has to accept memory on allocation; - page allocator has to accept memory on the first allocation of the page; Memblock change is trivial. The page allocator is modified to accept pages on the first allocation. PageOffline() is used to indicate that the page requires acceptance. The flag is currently used by hotplug and ballooning. Such pages are not available to the page allocator. Architecture has to provide three helpers if it wants to support unaccepted memory: - accept_memory() makes a range of physical addresses accepted. - maybe_set_page_offline() marks a page PageOffline() if it requires acceptance. Used during boot to put pages on free lists. - accept_and_clear_page_offline() makes a page accepted and clears PageOffline(). Signed-off-by: Kirill A. Shutemov --- include/linux/page-flags.h | 4 ++++ mm/internal.h | 15 +++++++++++++++ mm/memblock.c | 1 + mm/page_alloc.c | 21 ++++++++++++++++++++- 4 files changed, 40 insertions(+), 1 deletion(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 52ec4b5e5615..281f70da329c 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -887,6 +887,10 @@ PAGE_TYPE_OPS(Buddy, buddy) * any further access to page content. PFN walkers that read content of random * pages should check PageOffline() and synchronize with such drivers using * page_offline_freeze()/page_offline_thaw(). + * + * If a PageOffline() page encountered on a buddy allocator's free list it has + * to be "accepted" before it can be used. + * See accept_and_clear_page_offline() and CONFIG_UNACCEPTED_MEMORY. */ PAGE_TYPE_OPS(Offline, offline) diff --git a/mm/internal.h b/mm/internal.h index 3b79a5c9427a..1738a4e2a27e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -713,4 +713,19 @@ void vunmap_range_noflush(unsigned long start, unsigned long end); int numa_migrate_prep(struct page *page, struct vm_area_struct *vma, unsigned long addr, int page_nid, int *flags); +#ifndef CONFIG_UNACCEPTED_MEMORY +static inline void maybe_set_page_offline(struct page *page, unsigned int order) +{ +} + +static inline void accept_and_clear_page_offline(struct page *page, + unsigned int order) +{ +} + +static inline void accept_memory(phys_addr_t start, phys_addr_t end) +{ +} +#endif + #endif /* __MM_INTERNAL_H */ diff --git a/mm/memblock.c b/mm/memblock.c index 1018e50566f3..6dfa594192de 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1400,6 +1400,7 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, */ kmemleak_alloc_phys(found, size, 0, 0); + accept_memory(found, found + size); return found; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..5707b4b5f774 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1064,6 +1064,7 @@ static inline void __free_one_page(struct page *page, unsigned int max_order; struct page *buddy; bool to_tail; + bool offline = PageOffline(page); max_order = min_t(unsigned int, MAX_ORDER - 1, pageblock_order); @@ -1097,6 +1098,10 @@ static inline void __free_one_page(struct page *page, clear_page_guard(zone, buddy, order, migratetype); else del_page_from_free_list(buddy, zone, order); + + if (PageOffline(buddy)) + offline = true; + combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -1130,6 +1135,9 @@ static inline void __free_one_page(struct page *page, done_merging: set_buddy_order(page, order); + if (offline) + __SetPageOffline(page); + if (fpi_flags & FPI_TO_TAIL) to_tail = true; else if (is_shuffle_order(order)) @@ -1155,7 +1163,8 @@ static inline void __free_one_page(struct page *page, static inline bool page_expected_state(struct page *page, unsigned long check_flags) { - if (unlikely(atomic_read(&page->_mapcount) != -1)) + if (unlikely(atomic_read(&page->_mapcount) != -1) && + !PageOffline(page)) return false; if (unlikely((unsigned long)page->mapping | @@ -1734,6 +1743,8 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn, { if (early_page_uninitialised(pfn)) return; + + maybe_set_page_offline(page, order); __free_pages_core(page, order); } @@ -1823,10 +1834,12 @@ static void __init deferred_free_range(unsigned long pfn, if (nr_pages == pageblock_nr_pages && (pfn & (pageblock_nr_pages - 1)) == 0) { set_pageblock_migratetype(page, MIGRATE_MOVABLE); + maybe_set_page_offline(page, pageblock_order); __free_pages_core(page, pageblock_order); return; } + accept_memory(pfn << PAGE_SHIFT, (pfn + nr_pages) << PAGE_SHIFT); for (i = 0; i < nr_pages; i++, page++, pfn++) { if ((pfn & (pageblock_nr_pages - 1)) == 0) set_pageblock_migratetype(page, MIGRATE_MOVABLE); @@ -2297,6 +2310,9 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high, migratetype)) continue; + if (PageOffline(page)) + __SetPageOffline(&page[size]); + add_to_free_list(&page[size], zone, high, migratetype); set_buddy_order(&page[size], high); } @@ -2393,6 +2409,9 @@ inline void post_alloc_hook(struct page *page, unsigned int order, */ kernel_unpoison_pages(page, 1 << order); + if (PageOffline(page)) + accept_and_clear_page_offline(page, order); + /* * As memory initialization might be integrated into KASAN, * kasan_alloc_pages and kernel_init_free_pages must be From patchwork Tue Jan 11 11:33:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 531260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61E34C433FE for ; Tue, 11 Jan 2022 11:33:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349477AbiAKLdP (ORCPT ); Tue, 11 Jan 2022 06:33:15 -0500 Received: from mga09.intel.com ([134.134.136.24]:37638 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349514AbiAKLdN (ORCPT ); Tue, 11 Jan 2022 06:33:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900793; x=1673436793; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HaLasq89QiEz30QdU5oVDEDYHaeFFEnWUHw6RN/QsG4=; b=AQcgThPwdGjGO25u1z+I7GA4b1Ah6Ux/1ZcDVbL6un2TeWf7nI1C8T7z qxfAAQmKEldc5GoHkPrlbzBTCTGxocFEwzgli5NOSO43ORr1ULZanMGbB Duti1bqJTDog4p7jg5J80nzgCXI2XQlOjfJPGljGa8GNPEy10+cZBkkgL 8yN3L5kp75BAHknN6UC11SCnfBxnEs6suOLbpNn41znaq5H8l+k1sEUtf 9Mtggpo4tLrT0HO8otlTkPpXbTKWrDeNJ8ZxK88RCLyN/u5+UdjiZkldt i4eE8SAoqYrWs4csHRsM81WZPP4J4Oz+2lXjQ0wGNxm79Wa8rRNdGncOm w==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="243261607" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="243261607" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="490351596" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga002.jf.intel.com with ESMTP; 11 Jan 2022 03:33:07 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 35429232; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 2/7] efi/x86: Get full memory map in allocate_e820() Date: Tue, 11 Jan 2022 14:33:09 +0300 Message-Id: <20220111113314.27173-3-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org Currently allocate_e820() only interested in the size of map and size of memory descriptor to determine how many e820 entries the kernel needs. UEFI Specification version 2.9 introduces a new memory type -- unaccepted memory. To track unaccepted memory kernel needs to allocate a bitmap. The size of the bitmap is dependent on the maximum physical address present in the system. A full memory map is required to find the maximum address. Modify allocate_e820() to get a full memory map. This is preparation for the next patch that implements handling of unaccepted memory in EFI stub. Signed-off-by: Kirill A. Shutemov --- drivers/firmware/efi/libstub/x86-stub.c | 28 ++++++++++++------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index f14c4ff5839f..a0b946182b5e 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -569,30 +569,28 @@ static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext, } static efi_status_t allocate_e820(struct boot_params *params, + struct efi_boot_memmap *map, struct setup_data **e820ext, u32 *e820ext_size) { - unsigned long map_size, desc_size, map_key; efi_status_t status; - __u32 nr_desc, desc_version; + __u32 nr_desc; - /* Only need the size of the mem map and size of each mem descriptor */ - map_size = 0; - status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key, - &desc_size, &desc_version); - if (status != EFI_BUFFER_TOO_SMALL) - return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED; - - nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS; + status = efi_get_memory_map(map); + if (status != EFI_SUCCESS) + return status; - if (nr_desc > ARRAY_SIZE(params->e820_table)) { - u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table); + nr_desc = *map->map_size / *map->desc_size; + if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) { + u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) + + EFI_MMAP_NR_SLACK_SLOTS; status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size); if (status != EFI_SUCCESS) - return status; + goto out; } - +out: + efi_bs_call(free_pool, *map->map); return EFI_SUCCESS; } @@ -642,7 +640,7 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle) priv.boot_params = boot_params; priv.efi = &boot_params->efi_info; - status = allocate_e820(boot_params, &e820ext, &e820ext_size); + status = allocate_e820(boot_params, &map, &e820ext, &e820ext_size); if (status != EFI_SUCCESS) return status; From patchwork Tue Jan 11 11:33:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 532375 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66D43C43219 for ; Tue, 11 Jan 2022 11:33:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349540AbiAKLdQ (ORCPT ); Tue, 11 Jan 2022 06:33:16 -0500 Received: from mga05.intel.com ([192.55.52.43]:64739 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349523AbiAKLdO (ORCPT ); Tue, 11 Jan 2022 06:33:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900794; x=1673436794; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=r8ybxKYOCLta3T20mMeFLRdqIJU8SN1mkWkRljEEK/k=; b=SzEZ3x3xSj0cE46RkKm8YAK6BzEk/566HyTg0ixrFXp2iEVlVOz8d+iE 9KS2lu6WB7uG6pdDQ41pQzcVKgdciFLrzb9IKbWmU6czgm1d4qtqvjYD+ XidjdN2OKe1Ppztk1SaB8qkGwsFP/xnMRqcJGETP6BLD2b1Nbn1Omyblf q++dENtq2DBczsTtZQDVrCIs/RaZ3vzX+09ZJr9Jf5pbNTUnCXAk/aO1h 8346W1Fz9qGr1onakGc+IZxCzZ1ncGKDCRBTtHCY8i1LGXMHvu/kh+RgD oUy2Zp8S391SbX08V3FMxFufxF+HLlTtW7Kd1Aiz1fpYIdJUymHz1/wnQ Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="329807861" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="329807861" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="558334285" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2022 03:33:08 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 3AF3E2C7; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 3/7] efi/x86: Implement support for unaccepted memory Date: Tue, 11 Jan 2022 14:33:10 +0300 Message-Id: <20220111113314.27173-4-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org UEFI Specification version 2.9 introduces the concept of memory acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, requiring memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific for the Virtual Machine platform. Accepting memory is costly and it makes VMM allocate memory for the accepted guest physical address range. It's better to postpone memory acceptance until memory is needed. It lowers boot time and reduces memory overhead. The kernel needs to know what memory has been accepted. Firmware communicates this information via memory map: a new memory type -- EFI_UNACCEPTED_MEMORY -- indicates such memory. Range-based tracking works fine for firmware, but it gets bulky for the kernel: e820 has to be modified on every page acceptance. It leads to table fragmentation, but there's a limited number of entries in the e820 table Another option is to mark such memory as usable in e820 and track if the range has been accepted in a bitmap. One bit in the bitmap represents 2MiB in the address space: one 4k page is enough to track 64GiB or physical address space. In the worst-case scenario -- a huge hole in the middle of the address space -- It needs 256MiB to handle 4PiB of the address space. Any unaccepted memory that is not aligned to 2M gets accepted upfront. The bitmap is allocated and constructed in the EFI stub and passed down to the kernel via boot_params. allocate_e820() allocates the bitmap if unaccepted memory is present, according to the maximum address in the memory map. The same boot_params.unaccepted_memory can be used to pass the bitmap between two kernels on kexec, but the use-case is not yet implemented. Signed-off-by: Kirill A. Shutemov --- Documentation/x86/zero-page.rst | 1 + arch/x86/boot/compressed/Makefile | 1 + arch/x86/boot/compressed/bitmap.c | 24 ++++++++ arch/x86/boot/compressed/unaccepted_memory.c | 45 +++++++++++++++ arch/x86/include/asm/unaccepted_memory.h | 12 ++++ arch/x86/include/uapi/asm/bootparam.h | 3 +- drivers/firmware/efi/Kconfig | 14 +++++ drivers/firmware/efi/efi.c | 1 + drivers/firmware/efi/libstub/x86-stub.c | 60 +++++++++++++++++++- include/linux/efi.h | 3 +- 10 files changed, 161 insertions(+), 3 deletions(-) create mode 100644 arch/x86/boot/compressed/bitmap.c create mode 100644 arch/x86/boot/compressed/unaccepted_memory.c create mode 100644 arch/x86/include/asm/unaccepted_memory.h diff --git a/Documentation/x86/zero-page.rst b/Documentation/x86/zero-page.rst index f088f5881666..8e3447a4b373 100644 --- a/Documentation/x86/zero-page.rst +++ b/Documentation/x86/zero-page.rst @@ -42,4 +42,5 @@ Offset/Size Proto Name Meaning 2D0/A00 ALL e820_table E820 memory map table (array of struct e820_entry) D00/1EC ALL eddbuf EDD data (array of struct edd_info) +ECC/008 ALL unaccepted_memory Bitmap of unaccepted memory (1bit == 2M) =========== ===== ======================= ================================================= diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 1bfe30ebadbe..f5b49e74d728 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -100,6 +100,7 @@ endif vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdcall.o +vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/unaccepted_memory.o vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a diff --git a/arch/x86/boot/compressed/bitmap.c b/arch/x86/boot/compressed/bitmap.c new file mode 100644 index 000000000000..bf58b259380a --- /dev/null +++ b/arch/x86/boot/compressed/bitmap.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Taken from lib/string.c */ + +#include + +void __bitmap_set(unsigned long *map, unsigned int start, int len) +{ + unsigned long *p = map + BIT_WORD(start); + const unsigned int size = start + len; + int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start); + + while (len - bits_to_set >= 0) { + *p |= mask_to_set; + len -= bits_to_set; + bits_to_set = BITS_PER_LONG; + mask_to_set = ~0UL; + p++; + } + if (len) { + mask_to_set &= BITMAP_LAST_WORD_MASK(size); + *p |= mask_to_set; + } +} diff --git a/arch/x86/boot/compressed/unaccepted_memory.c b/arch/x86/boot/compressed/unaccepted_memory.c new file mode 100644 index 000000000000..d8081cde0eed --- /dev/null +++ b/arch/x86/boot/compressed/unaccepted_memory.c @@ -0,0 +1,45 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "error.h" +#include "misc.h" + +static inline void __accept_memory(phys_addr_t start, phys_addr_t end) +{ + /* Platform-specific memory-acceptance call goes here */ + error("Cannot accept memory"); +} + +void mark_unaccepted(struct boot_params *params, u64 start, u64 end) +{ + /* + * The accepted memory bitmap only works at PMD_SIZE granularity. + * If a request comes in to mark memory as unaccepted which is not + * PMD_SIZE-aligned, simply accept the memory now since it can not be + * *marked* as unaccepted. + */ + + /* Immediately accept whole range if it is within a PMD_SIZE block: */ + if ((start & PMD_MASK) == (end & PMD_MASK)) { + npages = (end - start) / PAGE_SIZE; + __accept_memory(start, start + npages * PAGE_SIZE); + return; + } + + /* Immediately accept a unaccepted_memory, + start / PMD_SIZE, (end - start) / PMD_SIZE); +} diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h new file mode 100644 index 000000000000..cbc24040b853 --- /dev/null +++ b/arch/x86/include/asm/unaccepted_memory.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2020 Intel Corporation */ +#ifndef _ASM_X86_UNACCEPTED_MEMORY_H +#define _ASM_X86_UNACCEPTED_MEMORY_H + +#include + +struct boot_params; + +void mark_unaccepted(struct boot_params *params, u64 start, u64 num); + +#endif diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index b25d3f82c2f3..16bc686a198d 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -217,7 +217,8 @@ struct boot_params { struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */ __u8 _pad8[48]; /* 0xcd0 */ struct edd_info eddbuf[EDDMAXNR]; /* 0xd00 */ - __u8 _pad9[276]; /* 0xeec */ + __u64 unaccepted_memory; /* 0xeec */ + __u8 _pad9[268]; /* 0xef4 */ } __attribute__((packed)); /** diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index 2c3dac5ecb36..36c1bf33f112 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -243,6 +243,20 @@ config EFI_DISABLE_PCI_DMA options "efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma" may be used to override this option. +config UNACCEPTED_MEMORY + bool + depends on EFI_STUB + help + Some Virtual Machine platforms, such as Intel TDX, introduce + the concept of memory acceptance, requiring memory to be accepted + before it can be used by the guest. This protects against a class of + attacks by the virtual machine platform. + + UEFI specification v2.9 introduced EFI_UNACCEPTED_MEMORY memory type. + + This option adds support for unaccepted memory and makes such memory + usable by kernel. + endmenu config EFI_EMBEDDED_FIRMWARE diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index ae79c3300129..abe862c381b6 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -740,6 +740,7 @@ static __initdata char memory_type_name[][13] = { "MMIO Port", "PAL Code", "Persistent", + "Unaccepted", }; char * __init efi_md_typeattr_format(char *buf, size_t size, diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index a0b946182b5e..346b12d6f1b2 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -9,12 +9,14 @@ #include #include #include +#include #include #include #include #include #include +#include #include "efistub.h" @@ -504,6 +506,13 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s e820_type = E820_TYPE_PMEM; break; + case EFI_UNACCEPTED_MEMORY: + if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) + continue; + e820_type = E820_TYPE_RAM; + mark_unaccepted(params, d->phys_addr, + d->phys_addr + PAGE_SIZE * d->num_pages); + break; default: continue; } @@ -575,6 +584,9 @@ static efi_status_t allocate_e820(struct boot_params *params, { efi_status_t status; __u32 nr_desc; + bool unaccepted_memory_present = false; + u64 max_addr = 0; + int i; status = efi_get_memory_map(map); if (status != EFI_SUCCESS) @@ -589,9 +601,55 @@ static efi_status_t allocate_e820(struct boot_params *params, if (status != EFI_SUCCESS) goto out; } + + if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) + goto out; + + /* Check if there's any unaccepted memory and find the max address */ + for (i = 0; i < nr_desc; i++) { + efi_memory_desc_t *d; + + d = efi_early_memdesc_ptr(*map->map, *map->desc_size, i); + if (d->type == EFI_UNACCEPTED_MEMORY) + unaccepted_memory_present = true; + if (d->phys_addr + d->num_pages * PAGE_SIZE > max_addr) + max_addr = d->phys_addr + d->num_pages * PAGE_SIZE; + } + + /* + * If unaccepted memory present allocate a bitmap to track what memory + * has to be accepted before access. + * + * One bit in the bitmap represents 2MiB in the address space: one 4k + * page is enough to track 64GiB or physical address space. + * + * In the worst case scenario -- a huge hole in the middle of the + * address space -- It needs 256MiB to handle 4PiB of the address + * space. + * + * TODO: handle situation if params->unaccepted_memory has already set. + * It's required to deal with kexec. + * + * The bitmap will be populated in setup_e820() according to the memory + * map after efi_exit_boot_services(). + */ + if (unaccepted_memory_present) { + unsigned long *unaccepted_memory = NULL; + u64 size = DIV_ROUND_UP(max_addr, PMD_SIZE * BITS_PER_BYTE); + + status = efi_allocate_pages(size, + (unsigned long *)&unaccepted_memory, + ULONG_MAX); + if (status != EFI_SUCCESS) + goto out; + memset(unaccepted_memory, 0, size); + params->unaccepted_memory = (u64)unaccepted_memory; + } + out: efi_bs_call(free_pool, *map->map); - return EFI_SUCCESS; + return status; + } struct exit_boot_struct { diff --git a/include/linux/efi.h b/include/linux/efi.h index dbd39b20e034..270333b9b94d 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -108,7 +108,8 @@ typedef struct { #define EFI_MEMORY_MAPPED_IO_PORT_SPACE 12 #define EFI_PAL_CODE 13 #define EFI_PERSISTENT_MEMORY 14 -#define EFI_MAX_MEMORY_TYPE 15 +#define EFI_UNACCEPTED_MEMORY 15 +#define EFI_MAX_MEMORY_TYPE 16 /* Attribute values: */ #define EFI_MEMORY_UC ((u64)0x0000000000000001ULL) /* uncached */ From patchwork Tue Jan 11 11:33:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 531259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B095DC433FE for ; Tue, 11 Jan 2022 11:33:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349544AbiAKLdR (ORCPT ); Tue, 11 Jan 2022 06:33:17 -0500 Received: from mga04.intel.com ([192.55.52.120]:11112 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349532AbiAKLdO (ORCPT ); Tue, 11 Jan 2022 06:33:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900794; x=1673436794; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5dqBdfMq7ze0AdxX7ODzhLdYdgNTaNVDweYGFgnO8Es=; b=fqoEHlDCCQe2QHOEKfRvAvLrrnb97qnJafGb/6KgtNaRISYqwnlAmOno 5NDqUuBvwT349+DIDVrE7bLqfe5cbvlox7hGnyrSyAr2PB6eHG9VG0X2i vmZ0SvpulU2Ik9Kt1I8Mwyhs5YnQO3muRq2cAkFpVlAUhTl/g37W8DjVl BMAf7HQjgaZ676qazBoboKqBG61fQNhO0SLa53+BfeQlvXoX3Drr++O55 lyoGDS9833NSdqOrO7nYSitTAtqA3OsTZpFDYZxoNPFiRyhfe2u077UWw GoOAMQSNOrH3mEd0FSBwjs3ZtmUql5ZtPU00vyW6CQPs2r95smEIEYXXa Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="242277580" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="242277580" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="472431125" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga003.jf.intel.com with ESMTP; 11 Jan 2022 03:33:08 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 5228C346; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 4/7] x86/boot/compressed: Handle unaccepted memory Date: Tue, 11 Jan 2022 14:33:11 +0300 Message-Id: <20220111113314.27173-5-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org Firmware is responsible for accepting memory where compressed kernel image and initrd land. But kernel has to accept memory for decompression buffer: accept memory just before decompression starts. KASLR is allowed to use unaccepted memory for the output buffer. Signed-off-by: Kirill A. Shutemov --- arch/x86/boot/compressed/bitmap.c | 62 ++++++++++++++++++++ arch/x86/boot/compressed/kaslr.c | 14 ++++- arch/x86/boot/compressed/misc.c | 9 +++ arch/x86/boot/compressed/unaccepted_memory.c | 13 ++++ arch/x86/include/asm/unaccepted_memory.h | 2 + 5 files changed, 98 insertions(+), 2 deletions(-) diff --git a/arch/x86/boot/compressed/bitmap.c b/arch/x86/boot/compressed/bitmap.c index bf58b259380a..ba2de61c0823 100644 --- a/arch/x86/boot/compressed/bitmap.c +++ b/arch/x86/boot/compressed/bitmap.c @@ -2,6 +2,48 @@ /* Taken from lib/string.c */ #include +#include +#include + +unsigned long _find_next_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long nbits, + unsigned long start, unsigned long invert, unsigned long le) +{ + unsigned long tmp, mask; + + if (unlikely(start >= nbits)) + return nbits; + + tmp = addr1[start / BITS_PER_LONG]; + if (addr2) + tmp &= addr2[start / BITS_PER_LONG]; + tmp ^= invert; + + /* Handle 1st word. */ + mask = BITMAP_FIRST_WORD_MASK(start); + if (le) + mask = swab(mask); + + tmp &= mask; + + start = round_down(start, BITS_PER_LONG); + + while (!tmp) { + start += BITS_PER_LONG; + if (start >= nbits) + return nbits; + + tmp = addr1[start / BITS_PER_LONG]; + if (addr2) + tmp &= addr2[start / BITS_PER_LONG]; + tmp ^= invert; + } + + if (le) + tmp = swab(tmp); + + return min(start + __ffs(tmp), nbits); +} void __bitmap_set(unsigned long *map, unsigned int start, int len) { @@ -22,3 +64,23 @@ void __bitmap_set(unsigned long *map, unsigned int start, int len) *p |= mask_to_set; } } + +void __bitmap_clear(unsigned long *map, unsigned int start, int len) +{ + unsigned long *p = map + BIT_WORD(start); + const unsigned int size = start + len; + int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start); + + while (len - bits_to_clear >= 0) { + *p &= ~mask_to_clear; + len -= bits_to_clear; + bits_to_clear = BITS_PER_LONG; + mask_to_clear = ~0UL; + p++; + } + if (len) { + mask_to_clear &= BITMAP_LAST_WORD_MASK(size); + *p &= ~mask_to_clear; + } +} diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index 411b268bc0a2..59db90626042 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -725,10 +725,20 @@ process_efi_entries(unsigned long minimum, unsigned long image_size) * but in practice there's firmware where using that memory leads * to crashes. * - * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free. + * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if + * supported) are guaranteed to be free. */ - if (md->type != EFI_CONVENTIONAL_MEMORY) + + switch (md->type) { + case EFI_CONVENTIONAL_MEMORY: + break; + case EFI_UNACCEPTED_MEMORY: + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) + break; continue; + default: + continue; + } if (efi_soft_reserve_enabled() && (md->attribute & EFI_MEMORY_SP)) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index d8373d766672..1e3efd0a8e11 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -18,6 +18,7 @@ #include "../string.h" #include "../voffset.h" #include +#include /* * WARNING!! @@ -446,6 +447,14 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, #endif debug_putstr("\nDecompressing Linux... "); + + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY) && + boot_params->unaccepted_memory) { + debug_putstr("Accepting memory... "); + accept_memory((phys_addr_t)output, + (phys_addr_t)output + needed_size); + } + __decompress(input_data, input_len, NULL, NULL, output, output_len, NULL, error); parse_elf(output); diff --git a/arch/x86/boot/compressed/unaccepted_memory.c b/arch/x86/boot/compressed/unaccepted_memory.c index d8081cde0eed..91db800d5f5e 100644 --- a/arch/x86/boot/compressed/unaccepted_memory.c +++ b/arch/x86/boot/compressed/unaccepted_memory.c @@ -43,3 +43,16 @@ void mark_unaccepted(struct boot_params *params, u64 start, u64 end) bitmap_set((unsigned long *)params->unaccepted_memory, start / PMD_SIZE, (end - start) / PMD_SIZE); } + +void accept_memory(phys_addr_t start, phys_addr_t end) +{ + unsigned long *unaccepted_memory; + unsigned int rs, re; + + unaccepted_memory = (unsigned long *)boot_params->unaccepted_memory; + bitmap_for_each_set_region(unaccepted_memory, rs, re, + start / PMD_SIZE, end / PMD_SIZE) { + __accept_memory(rs * PMD_SIZE, re * PMD_SIZE); + bitmap_clear(unaccepted_memory, rs, re - rs); + } +} diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h index cbc24040b853..f1f835d3cd78 100644 --- a/arch/x86/include/asm/unaccepted_memory.h +++ b/arch/x86/include/asm/unaccepted_memory.h @@ -9,4 +9,6 @@ struct boot_params; void mark_unaccepted(struct boot_params *params, u64 start, u64 num); +void accept_memory(phys_addr_t start, phys_addr_t end); + #endif From patchwork Tue Jan 11 11:33:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 532374 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0428BC433EF for ; Tue, 11 Jan 2022 11:33:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349630AbiAKLdh (ORCPT ); Tue, 11 Jan 2022 06:33:37 -0500 Received: from mga09.intel.com ([134.134.136.24]:37650 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349553AbiAKLdT (ORCPT ); Tue, 11 Jan 2022 06:33:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900799; x=1673436799; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hyOD0iOzQ3OdcyuSvPELW4e2F6ocBQ3s5Blwr9IqDVU=; b=UQ9mjWbR+pCIH+8JdoyKKi7x42HyQAKrR+0etELonfdVlW/4rFPKfU4N 03O2y3zRnHVbNDtLNNAD1kghcxpOdmaSQf9WeimTgM/RVWa7RWSzy2CFQ DxLwylAjGgccAMZQRgWm5/N705cwTeKXjufzseDVgHQ2psTzUpiWvQnCX geQ8byxKOeROVl33pRrZIXXL3o9fg4vqZ14l7ZIc1lk9ZajUw/EltX2SA Vy8XHf7MtPUJZFfqnxJqhIkj/S95cFKOjbH0Wx4qDBOsePsVhMoiX1us2 52eOJ6nZBTlAJlxIQF/z6UaKewStCuA1hXDnAvEDfmS91f+rXaWZu4FL3 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="243261638" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="243261638" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="623042894" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga004.jf.intel.com with ESMTP; 11 Jan 2022 03:33:14 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 5F7974AC; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 5/7] x86/mm: Reserve unaccepted memory bitmap Date: Tue, 11 Jan 2022 14:33:12 +0300 Message-Id: <20220111113314.27173-6-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org Unaccepted memory bitmap is allocated during decompression stage and handed over to main kernel image via boot_params. The bitmap is used to track if memory has been accepted. Reserve unaccepted memory bitmap has to prevent reallocating memory for other means. Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/e820.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index bc0657f0deed..dc9048e2d421 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1290,6 +1290,16 @@ void __init e820__memory_setup(void) pr_info("BIOS-provided physical RAM map:\n"); e820__print_table(who); + + /* Mark unaccepted memory bitmap reserved */ + if (boot_params.unaccepted_memory) { + unsigned long size; + + /* One bit per 2MB */ + size = DIV_ROUND_UP(e820__end_of_ram_pfn() * PAGE_SIZE, + PMD_SIZE * BITS_PER_BYTE); + memblock_reserve(boot_params.unaccepted_memory, size); + } } void __init e820__memblock_setup(void) From patchwork Tue Jan 11 11:33:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 532373 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18808C433EF for ; Tue, 11 Jan 2022 11:33:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349638AbiAKLdk (ORCPT ); Tue, 11 Jan 2022 06:33:40 -0500 Received: from mga06.intel.com ([134.134.136.31]:29988 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349569AbiAKLdY (ORCPT ); Tue, 11 Jan 2022 06:33:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900803; x=1673436803; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XXiNVM5MD77OC0PKAJ7Qh/8iyqoz0w1CDeuRwsZQWGI=; b=LiKE8RG22ZL9qXwub7zmaAvo44gGeblGhsYrdrL2F0qRBhpyUJasboT0 iIN0yKzPaFGULMSokTqnqMFx0t5YGXtmKMr2Q+UVXBIoV42zx8euMcfjH f1riuPx4TB8hleF1fY62c6NgDxL8hARsYYk++QeGaJ829Rv93F0/q+4e2 Y21dL8QsKKpYgqgarS0QD/VCOR7u6Qag+WY6aEX+NkrRmmlz4UPqVFpD/ /GrznyrNFZp8RQM6NIfbcymVb/+Ap5CI0A5rKA0VNgeZqujC0oPidkDUS AQo5Q5J5OivZf4xCr+vLvtd0ej/NLdbEAcO71HTFDfpDDzXlWjpSPTMjj w==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="304202781" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="304202781" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="490351608" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga002.jf.intel.com with ESMTP; 11 Jan 2022 03:33:14 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 6C8D9651; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 6/7] x86/mm: Provide helpers for unaccepted memory Date: Tue, 11 Jan 2022 14:33:13 +0300 Message-Id: <20220111113314.27173-7-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org Core-mm requires few helpers to support unaccepted memory: - accept_memory() checks the range of addresses against the bitmap and accept memory if needed; - maybe_set_page_offline() checks the bitmap and marks a page with PageOffline() if memory acceptance required on the first allocation of the page. - accept_and_clear_page_offline() accepts memory for the page and clears PageOffline(). Signed-off-by: Kirill A. Shutemov --- arch/x86/boot/compressed/unaccepted_memory.c | 3 +- arch/x86/include/asm/page.h | 5 ++ arch/x86/include/asm/unaccepted_memory.h | 3 + arch/x86/mm/Makefile | 2 + arch/x86/mm/unaccepted_memory.c | 90 ++++++++++++++++++++ 5 files changed, 101 insertions(+), 2 deletions(-) create mode 100644 arch/x86/mm/unaccepted_memory.c diff --git a/arch/x86/boot/compressed/unaccepted_memory.c b/arch/x86/boot/compressed/unaccepted_memory.c index 91db800d5f5e..b6caca4d3d22 100644 --- a/arch/x86/boot/compressed/unaccepted_memory.c +++ b/arch/x86/boot/compressed/unaccepted_memory.c @@ -20,8 +20,7 @@ void mark_unaccepted(struct boot_params *params, u64 start, u64 end) /* Immediately accept whole range if it is within a PMD_SIZE block: */ if ((start & PMD_MASK) == (end & PMD_MASK)) { - npages = (end - start) / PAGE_SIZE; - __accept_memory(start, start + npages * PAGE_SIZE); + __accept_memory(start, end); return; } diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 4d5810c8fab7..1e56d76ca474 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -19,6 +19,11 @@ struct page; #include + +#ifdef CONFIG_UNACCEPTED_MEMORY +#include +#endif + extern struct range pfn_mapped[]; extern int nr_pfn_mapped; diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h index f1f835d3cd78..8a06ac8fc9e9 100644 --- a/arch/x86/include/asm/unaccepted_memory.h +++ b/arch/x86/include/asm/unaccepted_memory.h @@ -6,9 +6,12 @@ #include struct boot_params; +struct page; void mark_unaccepted(struct boot_params *params, u64 start, u64 num); void accept_memory(phys_addr_t start, phys_addr_t end); +void maybe_set_page_offline(struct page *page, unsigned int order); +void accept_and_clear_page_offline(struct page *page, unsigned int order); #endif diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index fe3d3061fc11..e327f83e6bbf 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -60,3 +60,5 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_amd.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o + +obj-$(CONFIG_UNACCEPTED_MEMORY) += unaccepted_memory.o diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c new file mode 100644 index 000000000000..984eaead0b11 --- /dev/null +++ b/arch/x86/mm/unaccepted_memory.c @@ -0,0 +1,90 @@ +#include +#include +#include +#include + +#include +#include +#include + +static DEFINE_SPINLOCK(unaccepted_memory_lock); + +#define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) + +static void __accept_memory(phys_addr_t start, phys_addr_t end) +{ + unsigned long *unaccepted_memory; + unsigned int rs, re; + + unaccepted_memory = __va(boot_params.unaccepted_memory); + bitmap_for_each_set_region(unaccepted_memory, rs, re, + start / PMD_SIZE, + DIV_ROUND_UP(end, PMD_SIZE)) { + /* Platform-specific memory-acceptance call goes here */ + panic("Cannot accept memory"); + bitmap_clear(unaccepted_memory, rs, re - rs); + } +} + +void accept_memory(phys_addr_t start, phys_addr_t end) +{ + unsigned long flags; + if (!boot_params.unaccepted_memory) + return; + + spin_lock_irqsave(&unaccepted_memory_lock, flags); + __accept_memory(start, end); + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); +} + +void __init maybe_set_page_offline(struct page *page, unsigned int order) +{ + unsigned long *unaccepted_memory; + phys_addr_t addr = page_to_phys(page); + unsigned long flags; + bool unaccepted = false; + unsigned int i; + + if (!boot_params.unaccepted_memory) + return; + + unaccepted_memory = __va(boot_params.unaccepted_memory); + spin_lock_irqsave(&unaccepted_memory_lock, flags); + if (order < PMD_ORDER) { + BUG_ON(test_bit(addr / PMD_SIZE, unaccepted_memory)); + goto out; + } + + for (i = 0; i < (1 << (order - PMD_ORDER)); i++) { + if (test_bit(addr / PMD_SIZE + i, unaccepted_memory)) { + unaccepted = true; + break; + } + } + + /* At least part of page is uneccepted */ + if (unaccepted) + __SetPageOffline(page); +out: + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); +} + +void accept_and_clear_page_offline(struct page *page, unsigned int order) +{ + phys_addr_t addr = round_down(page_to_phys(page), PMD_SIZE); + int i; + + /* PageOffline() page on a free list, but no unaccepted memory? Hm. */ + WARN_ON_ONCE(!boot_params.unaccepted_memory); + + page = pfn_to_page(addr >> PAGE_SHIFT); + if (order < PMD_ORDER) + order = PMD_ORDER; + + accept_memory(addr, addr + (PAGE_SIZE << order)); + + for (i = 0; i < (1 << order); i++) { + if (PageOffline(page + i)) + __ClearPageOffline(page + i); + } +} From patchwork Tue Jan 11 11:33:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 531258 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87642C433FE for ; Tue, 11 Jan 2022 11:33:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349633AbiAKLdh (ORCPT ); Tue, 11 Jan 2022 06:33:37 -0500 Received: from mga04.intel.com ([192.55.52.120]:11112 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349556AbiAKLdT (ORCPT ); Tue, 11 Jan 2022 06:33:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900799; x=1673436799; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ix1Lln08Q+6dMYxeRkLCXlqDI7dwftevgrEgpLe26DE=; b=EOCPcVVvhyRtHbK7HrePAxpd/dSDdH9MuzZtKMbASPpxXrEe+zUbG6aL NNOR/XJPncU3SCyvQRd1bIS+PiueDzpisCmSlqKQl9g1LDWgZqF3aFAME 0QjIbF5i7hXlXeZh4Q791/X487ySA2KGdsmtkOHOEdInyQNlbegc+Ywik IHATTdIhHNThrB7Jchkd6vAjXSO5j+5KMkLuq9ZEn+JfgiQF/rw7E8xAz y7DYAB9iTNMCynRs2YuW6JMYEKGRL9IBBgaXdOZwUoL2w5c5E4AYh7gab 0wOUi3gG2fBmsTxYvoEzJFlxf4reb8fNj6vUM2C+np9DLyAl8Qw7RBUJT g==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="242277611" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="242277611" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="576179347" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 11 Jan 2022 03:33:14 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 79EB7699; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 7/7] x86/tdx: Unaccepted memory support Date: Tue, 11 Jan 2022 14:33:14 +0300 Message-Id: <20220111113314.27173-8-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org All preparation is complete. Hookup TDX-specific code to accept memory. There are two tdx_accept_memory() implementations: one in main kernel and one in the decompressor. The implementation in core kernel uses tdx_hcall_gpa_intent(). The helper is not available in the decompressor, self-contained implementation added there instead. Signed-off-by: Kirill A. Shutemov --- arch/x86/Kconfig | 1 + arch/x86/boot/compressed/tdx.c | 67 ++++++++++++++++++++ arch/x86/boot/compressed/unaccepted_memory.c | 9 ++- arch/x86/include/asm/tdx.h | 2 + arch/x86/kernel/tdx.c | 7 ++ arch/x86/mm/unaccepted_memory.c | 6 +- 6 files changed, 90 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e2ed1684f399..5d0f99bd3538 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -879,6 +879,7 @@ config INTEL_TDX_GUEST select ARCH_HAS_CC_PLATFORM select X86_MCE select X86_MEM_ENCRYPT + select UNACCEPTED_MEMORY help Support running as a guest under Intel TDX. Without this support, the guest kernel can not boot or run under TDX. diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c index 50c8145bd0f3..587e6d948953 100644 --- a/arch/x86/boot/compressed/tdx.c +++ b/arch/x86/boot/compressed/tdx.c @@ -5,12 +5,54 @@ #include "../cpuflags.h" #include "../string.h" +#include "error.h" +#include + +#define TDX_HYPERCALL_STANDARD 0 #define TDX_CPUID_LEAF_ID 0x21 #define TDX_IDENT "IntelTDX " +/* + * Used in __tdx_module_call() helper function to gather the + * output registers' values of TDCALL instruction when requesting + * services from the TDX module. This is software only structure + * and not related to TDX module/VMM. + */ +struct tdx_module_output { + u64 rcx; + u64 rdx; + u64 r8; + u64 r9; + u64 r10; + u64 r11; +}; + +/* + * Used in __tdx_hypercall() helper function to gather the + * output registers' values of TDCALL instruction when requesting + * services from the VMM. This is software only structure + * and not related to TDX module/VMM. + */ +struct tdx_hypercall_output { + u64 r10; + u64 r11; + u64 r12; + u64 r13; + u64 r14; + u64 r15; +}; + static bool tdx_guest_detected; +/* Helper function used to communicate with the TDX module */ +u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, + struct tdx_module_output *out); + +/* Helper function used to request services from VMM */ +u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14, + u64 r15, struct tdx_hypercall_output *out); + void early_tdx_detect(void) { u32 eax, sig[3]; @@ -28,3 +70,28 @@ bool early_is_tdx_guest(void) { return tdx_guest_detected; } + +#define TDACCEPTPAGE 6 +#define TDVMCALL_MAP_GPA 0x10001 + +void tdx_accept_memory(phys_addr_t start, phys_addr_t end) +{ + struct tdx_hypercall_output outl = {0}; + int i; + + if (__tdx_hypercall(TDX_HYPERCALL_STANDARD, TDVMCALL_MAP_GPA, + start, end, 0, 0, &outl)) { + error("Cannot accept memory: MapGPA failed\n"); + } + + /* + * For shared->private conversion, accept the page using TDACCEPTPAGE + * TDX module call. + */ + for (i = 0; i < (end - start) / PAGE_SIZE; i++) { + if (__tdx_module_call(TDACCEPTPAGE, start + i * PAGE_SIZE, + 0, 0, 0, NULL)) { + error("Cannot accept memory: page accept failed\n"); + } + } +} diff --git a/arch/x86/boot/compressed/unaccepted_memory.c b/arch/x86/boot/compressed/unaccepted_memory.c index b6caca4d3d22..c23526c25e50 100644 --- a/arch/x86/boot/compressed/unaccepted_memory.c +++ b/arch/x86/boot/compressed/unaccepted_memory.c @@ -2,11 +2,15 @@ #include "error.h" #include "misc.h" +#include "tdx.h" static inline void __accept_memory(phys_addr_t start, phys_addr_t end) { /* Platform-specific memory-acceptance call goes here */ - error("Cannot accept memory"); + if (early_is_tdx_guest()) + tdx_accept_memory(start, end); + else + error("Cannot accept memory"); } void mark_unaccepted(struct boot_params *params, u64 start, u64 end) @@ -18,6 +22,9 @@ void mark_unaccepted(struct boot_params *params, u64 start, u64 end) * *marked* as unaccepted. */ + /* __accept_memory() needs to know if kernel runs in TDX environment */ + early_tdx_detect(); + /* Immediately accept whole range if it is within a PMD_SIZE block: */ if ((start & PMD_MASK) == (end & PMD_MASK)) { __accept_memory(start, end); diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 6d901cb6d607..fbbe4644cc7b 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -90,6 +90,8 @@ phys_addr_t tdx_shared_mask(void); int tdx_hcall_request_gpa_type(phys_addr_t start, phys_addr_t end, enum tdx_map_type map_type); +extern void tdx_accept_memory(phys_addr_t start, phys_addr_t end); + #else static inline void tdx_early_init(void) { }; diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index 0f8f7285c05b..a0ff720425d8 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -162,6 +162,13 @@ int tdx_hcall_request_gpa_type(phys_addr_t start, phys_addr_t end, return 0; } +void tdx_accept_memory(phys_addr_t start, phys_addr_t end) +{ + if (tdx_hcall_request_gpa_type(start, end, TDX_MAP_PRIVATE)) { + panic("Accepting memory failed\n"); + } +} + static u64 __cpuidle _tdx_halt(const bool irq_disabled, const bool do_sti) { /* diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c index 984eaead0b11..9f468d58d51f 100644 --- a/arch/x86/mm/unaccepted_memory.c +++ b/arch/x86/mm/unaccepted_memory.c @@ -5,6 +5,7 @@ #include #include +#include #include static DEFINE_SPINLOCK(unaccepted_memory_lock); @@ -21,7 +22,10 @@ static void __accept_memory(phys_addr_t start, phys_addr_t end) start / PMD_SIZE, DIV_ROUND_UP(end, PMD_SIZE)) { /* Platform-specific memory-acceptance call goes here */ - panic("Cannot accept memory"); + if (cc_platform_has(CC_ATTR_GUEST_TDX)) + tdx_accept_memory(rs * PMD_SIZE, re * PMD_SIZE); + else + panic("Cannot accept memory"); bitmap_clear(unaccepted_memory, rs, re - rs); } }