diff mbox series

x86/mm: Make e820_end_ram_pfn() cover E820_TYPE_ACPI ranges

Message ID 20230816212418.25069-1-kirill.shutemov@linux.intel.com
State Accepted
Commit 06fa48d85b09b3e67afeda220bc19f7102b53beb
Headers show
Series x86/mm: Make e820_end_ram_pfn() cover E820_TYPE_ACPI ranges | expand

Commit Message

Kirill A. Shutemov Aug. 16, 2023, 9:24 p.m. UTC
e820__end_of_ram_pfn() is used to calculate max_pfn which, among other
things, guides where direct mapping ends. Any memory above max_pfn is
not going to be present in the direct mapping.

e820__end_of_ram_pfn() finds the end of the ram based on the highest
E820_TYPE_RAM range. But it doesn't includes E820_TYPE_ACPI ranges into
calculation.

Despite the name, E820_TYPE_ACPI covers not only ACPI data, but also EFI
tables and might be required by kernel to function properly.

Usually the problem is hidden because there is some E820_TYPE_RAM memory
above E820_TYPE_ACPI. But crashkernel only presents pre-allocated crash
memory as E820_TYPE_RAM on boot. If the preallocated range is small, it
can fit under the last E820_TYPE_ACPI range.

Modify e820__end_of_ram_pfn() and e820__end_of_low_ram_pfn() to cover
E820_TYPE_ACPI memory.

The problem was discovered during debugging kexec for TDX guest. TDX
guest uses E820_TYPE_ACPI to store the unaccepted memory bitmap and pass
it between the kernels on kexec.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/e820.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Ard Biesheuvel Aug. 17, 2023, 8:25 p.m. UTC | #1
On Wed, 16 Aug 2023 at 23:24, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> e820__end_of_ram_pfn() is used to calculate max_pfn which, among other
> things, guides where direct mapping ends. Any memory above max_pfn is
> not going to be present in the direct mapping.
>
> e820__end_of_ram_pfn() finds the end of the ram based on the highest
> E820_TYPE_RAM range. But it doesn't includes E820_TYPE_ACPI ranges into
> calculation.
>
> Despite the name, E820_TYPE_ACPI covers not only ACPI data, but also EFI
> tables and might be required by kernel to function properly.
>
> Usually the problem is hidden because there is some E820_TYPE_RAM memory
> above E820_TYPE_ACPI. But crashkernel only presents pre-allocated crash
> memory as E820_TYPE_RAM on boot. If the preallocated range is small, it
> can fit under the last E820_TYPE_ACPI range.
>
> Modify e820__end_of_ram_pfn() and e820__end_of_low_ram_pfn() to cover
> E820_TYPE_ACPI memory.
>
> The problem was discovered during debugging kexec for TDX guest. TDX
> guest uses E820_TYPE_ACPI to store the unaccepted memory bitmap and pass
> it between the kernels on kexec.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

No objections to this, but we might also simply drop E820_TYPE_ACPI
altogether: it is only used for EFI_ACPI_RECLAIM_MEMORY, which is
memory that can be used by the OS as ordinary RAM if it is not
interested in the contents (or has already consumed them). So this
could arguably be classified as E820_TYPE_RAM too.

> ---
>  arch/x86/kernel/e820.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index fb8cf953380d..99c80680dc9e 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -827,7 +827,7 @@ u64 __init e820__memblock_alloc_reserved(u64 size, u64 align)
>  /*
>   * Find the highest page frame number we have available
>   */
> -static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type type)
> +static unsigned long __init e820_end_ram_pfn(unsigned long limit_pfn)
>  {
>         int i;
>         unsigned long last_pfn = 0;
> @@ -838,7 +838,8 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type
>                 unsigned long start_pfn;
>                 unsigned long end_pfn;
>
> -               if (entry->type != type)
> +               if (entry->type != E820_TYPE_RAM &&
> +                   entry->type != E820_TYPE_ACPI)
>                         continue;
>
>                 start_pfn = entry->addr >> PAGE_SHIFT;
> @@ -864,12 +865,12 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type
>
>  unsigned long __init e820__end_of_ram_pfn(void)
>  {
> -       return e820_end_pfn(MAX_ARCH_PFN, E820_TYPE_RAM);
> +       return e820_end_ram_pfn(MAX_ARCH_PFN);
>  }
>
>  unsigned long __init e820__end_of_low_ram_pfn(void)
>  {
> -       return e820_end_pfn(1UL << (32 - PAGE_SHIFT), E820_TYPE_RAM);
> +       return e820_end_ram_pfn(1UL << (32 - PAGE_SHIFT));
>  }
>
>  static void __init early_panic(char *msg)
> --
> 2.41.0
>
Kirill A. Shutemov Aug. 18, 2023, 3:16 p.m. UTC | #2
On Thu, Aug 17, 2023 at 10:25:56PM +0200, Ard Biesheuvel wrote:
> On Wed, 16 Aug 2023 at 23:24, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > e820__end_of_ram_pfn() is used to calculate max_pfn which, among other
> > things, guides where direct mapping ends. Any memory above max_pfn is
> > not going to be present in the direct mapping.
> >
> > e820__end_of_ram_pfn() finds the end of the ram based on the highest
> > E820_TYPE_RAM range. But it doesn't includes E820_TYPE_ACPI ranges into
> > calculation.
> >
> > Despite the name, E820_TYPE_ACPI covers not only ACPI data, but also EFI
> > tables and might be required by kernel to function properly.
> >
> > Usually the problem is hidden because there is some E820_TYPE_RAM memory
> > above E820_TYPE_ACPI. But crashkernel only presents pre-allocated crash
> > memory as E820_TYPE_RAM on boot. If the preallocated range is small, it
> > can fit under the last E820_TYPE_ACPI range.
> >
> > Modify e820__end_of_ram_pfn() and e820__end_of_low_ram_pfn() to cover
> > E820_TYPE_ACPI memory.
> >
> > The problem was discovered during debugging kexec for TDX guest. TDX
> > guest uses E820_TYPE_ACPI to store the unaccepted memory bitmap and pass
> > it between the kernels on kexec.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> No objections to this, but we might also simply drop E820_TYPE_ACPI
> altogether: it is only used for EFI_ACPI_RECLAIM_MEMORY, which is
> memory that can be used by the OS as ordinary RAM if it is not
> interested in the contents (or has already consumed them). So this
> could arguably be classified as E820_TYPE_RAM too.

Hm. I'm not sure about this. E820_TYPE_ACPI also get tracked as
IORES_DESC_ACPI_TABLES resource and get passed to the next kernel on
kexec, regardless if it is crash kernel or not. I'm not sure we would not
break anything.
Ard Biesheuvel Sept. 6, 2023, 9:17 a.m. UTC | #3
On Fri, 18 Aug 2023 at 17:16, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Thu, Aug 17, 2023 at 10:25:56PM +0200, Ard Biesheuvel wrote:
> > On Wed, 16 Aug 2023 at 23:24, Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > e820__end_of_ram_pfn() is used to calculate max_pfn which, among other
> > > things, guides where direct mapping ends. Any memory above max_pfn is
> > > not going to be present in the direct mapping.
> > >
> > > e820__end_of_ram_pfn() finds the end of the ram based on the highest
> > > E820_TYPE_RAM range. But it doesn't includes E820_TYPE_ACPI ranges into
> > > calculation.
> > >
> > > Despite the name, E820_TYPE_ACPI covers not only ACPI data, but also EFI
> > > tables and might be required by kernel to function properly.
> > >
> > > Usually the problem is hidden because there is some E820_TYPE_RAM memory
> > > above E820_TYPE_ACPI. But crashkernel only presents pre-allocated crash
> > > memory as E820_TYPE_RAM on boot. If the preallocated range is small, it
> > > can fit under the last E820_TYPE_ACPI range.
> > >
> > > Modify e820__end_of_ram_pfn() and e820__end_of_low_ram_pfn() to cover
> > > E820_TYPE_ACPI memory.
> > >
> > > The problem was discovered during debugging kexec for TDX guest. TDX
> > > guest uses E820_TYPE_ACPI to store the unaccepted memory bitmap and pass
> > > it between the kernels on kexec.
> > >
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >
> > No objections to this, but we might also simply drop E820_TYPE_ACPI
> > altogether: it is only used for EFI_ACPI_RECLAIM_MEMORY, which is
> > memory that can be used by the OS as ordinary RAM if it is not
> > interested in the contents (or has already consumed them). So this
> > could arguably be classified as E820_TYPE_RAM too.
>
> Hm. I'm not sure about this. E820_TYPE_ACPI also get tracked as
> IORES_DESC_ACPI_TABLES resource and get passed to the next kernel on
> kexec, regardless if it is crash kernel or not. I'm not sure we would not
> break anything.
>

Yeah, you're right. So this patch is necessary in any case.

Do we also need the EFI side patch then?
Kirill A. Shutemov Sept. 7, 2023, 10:49 a.m. UTC | #4
On Wed, Sep 06, 2023 at 11:17:12AM +0200, Ard Biesheuvel wrote:
> On Fri, 18 Aug 2023 at 17:16, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > On Thu, Aug 17, 2023 at 10:25:56PM +0200, Ard Biesheuvel wrote:
> > > On Wed, 16 Aug 2023 at 23:24, Kirill A. Shutemov
> > > <kirill.shutemov@linux.intel.com> wrote:
> > > >
> > > > e820__end_of_ram_pfn() is used to calculate max_pfn which, among other
> > > > things, guides where direct mapping ends. Any memory above max_pfn is
> > > > not going to be present in the direct mapping.
> > > >
> > > > e820__end_of_ram_pfn() finds the end of the ram based on the highest
> > > > E820_TYPE_RAM range. But it doesn't includes E820_TYPE_ACPI ranges into
> > > > calculation.
> > > >
> > > > Despite the name, E820_TYPE_ACPI covers not only ACPI data, but also EFI
> > > > tables and might be required by kernel to function properly.
> > > >
> > > > Usually the problem is hidden because there is some E820_TYPE_RAM memory
> > > > above E820_TYPE_ACPI. But crashkernel only presents pre-allocated crash
> > > > memory as E820_TYPE_RAM on boot. If the preallocated range is small, it
> > > > can fit under the last E820_TYPE_ACPI range.
> > > >
> > > > Modify e820__end_of_ram_pfn() and e820__end_of_low_ram_pfn() to cover
> > > > E820_TYPE_ACPI memory.
> > > >
> > > > The problem was discovered during debugging kexec for TDX guest. TDX
> > > > guest uses E820_TYPE_ACPI to store the unaccepted memory bitmap and pass
> > > > it between the kernels on kexec.
> > > >
> > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > >
> > > No objections to this, but we might also simply drop E820_TYPE_ACPI
> > > altogether: it is only used for EFI_ACPI_RECLAIM_MEMORY, which is
> > > memory that can be used by the OS as ordinary RAM if it is not
> > > interested in the contents (or has already consumed them). So this
> > > could arguably be classified as E820_TYPE_RAM too.
> >
> > Hm. I'm not sure about this. E820_TYPE_ACPI also get tracked as
> > IORES_DESC_ACPI_TABLES resource and get passed to the next kernel on
> > kexec, regardless if it is crash kernel or not. I'm not sure we would not
> > break anything.
> >
> 
> Yeah, you're right. So this patch is necessary in any case.
> 
> Do we also need the EFI side patch then?

Yes, we need it to get it mapped into the crashkernel direct mapping.
Kirill A. Shutemov Sept. 17, 2023, 5:06 p.m. UTC | #5
On Thu, Sep 07, 2023 at 01:49:14PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 06, 2023 at 11:17:12AM +0200, Ard Biesheuvel wrote:
> > On Fri, 18 Aug 2023 at 17:16, Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > On Thu, Aug 17, 2023 at 10:25:56PM +0200, Ard Biesheuvel wrote:
> > > > On Wed, 16 Aug 2023 at 23:24, Kirill A. Shutemov
> > > > <kirill.shutemov@linux.intel.com> wrote:
> > > > >
> > > > > e820__end_of_ram_pfn() is used to calculate max_pfn which, among other
> > > > > things, guides where direct mapping ends. Any memory above max_pfn is
> > > > > not going to be present in the direct mapping.
> > > > >
> > > > > e820__end_of_ram_pfn() finds the end of the ram based on the highest
> > > > > E820_TYPE_RAM range. But it doesn't includes E820_TYPE_ACPI ranges into
> > > > > calculation.
> > > > >
> > > > > Despite the name, E820_TYPE_ACPI covers not only ACPI data, but also EFI
> > > > > tables and might be required by kernel to function properly.
> > > > >
> > > > > Usually the problem is hidden because there is some E820_TYPE_RAM memory
> > > > > above E820_TYPE_ACPI. But crashkernel only presents pre-allocated crash
> > > > > memory as E820_TYPE_RAM on boot. If the preallocated range is small, it
> > > > > can fit under the last E820_TYPE_ACPI range.
> > > > >
> > > > > Modify e820__end_of_ram_pfn() and e820__end_of_low_ram_pfn() to cover
> > > > > E820_TYPE_ACPI memory.
> > > > >
> > > > > The problem was discovered during debugging kexec for TDX guest. TDX
> > > > > guest uses E820_TYPE_ACPI to store the unaccepted memory bitmap and pass
> > > > > it between the kernels on kexec.
> > > > >
> > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > >
> > > > No objections to this, but we might also simply drop E820_TYPE_ACPI
> > > > altogether: it is only used for EFI_ACPI_RECLAIM_MEMORY, which is
> > > > memory that can be used by the OS as ordinary RAM if it is not
> > > > interested in the contents (or has already consumed them). So this
> > > > could arguably be classified as E820_TYPE_RAM too.
> > >
> > > Hm. I'm not sure about this. E820_TYPE_ACPI also get tracked as
> > > IORES_DESC_ACPI_TABLES resource and get passed to the next kernel on
> > > kexec, regardless if it is crash kernel or not. I'm not sure we would not
> > > break anything.
> > >
> > 
> > Yeah, you're right. So this patch is necessary in any case.
> > 
> > Do we also need the EFI side patch then?
> 
> Yes, we need it to get it mapped into the crashkernel direct mapping.

Ughh. The patch alone causes crash as EFI_ACPI_RELACLAIM_MEMORY is not
mapped into direct mapping during memory init.

The patch below fixes the issue.

I should have noticed it before, but I had essentially the same patch in
my tree for a different reason. Sorry for that :/

Please apply the patch below.
Ard Biesheuvel Sept. 18, 2023, 7:01 a.m. UTC | #6
On Sun, 17 Sept 2023 at 19:06, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Thu, Sep 07, 2023 at 01:49:14PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 06, 2023 at 11:17:12AM +0200, Ard Biesheuvel wrote:
> > > On Fri, 18 Aug 2023 at 17:16, Kirill A. Shutemov
> > > <kirill.shutemov@linux.intel.com> wrote:
> > > >
> > > > On Thu, Aug 17, 2023 at 10:25:56PM +0200, Ard Biesheuvel wrote:
> > > > > On Wed, 16 Aug 2023 at 23:24, Kirill A. Shutemov
> > > > > <kirill.shutemov@linux.intel.com> wrote:
> > > > > >
> > > > > > e820__end_of_ram_pfn() is used to calculate max_pfn which, among other
> > > > > > things, guides where direct mapping ends. Any memory above max_pfn is
> > > > > > not going to be present in the direct mapping.
> > > > > >
> > > > > > e820__end_of_ram_pfn() finds the end of the ram based on the highest
> > > > > > E820_TYPE_RAM range. But it doesn't includes E820_TYPE_ACPI ranges into
> > > > > > calculation.
> > > > > >
> > > > > > Despite the name, E820_TYPE_ACPI covers not only ACPI data, but also EFI
> > > > > > tables and might be required by kernel to function properly.
> > > > > >
> > > > > > Usually the problem is hidden because there is some E820_TYPE_RAM memory
> > > > > > above E820_TYPE_ACPI. But crashkernel only presents pre-allocated crash
> > > > > > memory as E820_TYPE_RAM on boot. If the preallocated range is small, it
> > > > > > can fit under the last E820_TYPE_ACPI range.
> > > > > >
> > > > > > Modify e820__end_of_ram_pfn() and e820__end_of_low_ram_pfn() to cover
> > > > > > E820_TYPE_ACPI memory.
> > > > > >
> > > > > > The problem was discovered during debugging kexec for TDX guest. TDX
> > > > > > guest uses E820_TYPE_ACPI to store the unaccepted memory bitmap and pass
> > > > > > it between the kernels on kexec.
> > > > > >
> > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > >
> > > > > No objections to this, but we might also simply drop E820_TYPE_ACPI
> > > > > altogether: it is only used for EFI_ACPI_RECLAIM_MEMORY, which is
> > > > > memory that can be used by the OS as ordinary RAM if it is not
> > > > > interested in the contents (or has already consumed them). So this
> > > > > could arguably be classified as E820_TYPE_RAM too.
> > > >
> > > > Hm. I'm not sure about this. E820_TYPE_ACPI also get tracked as
> > > > IORES_DESC_ACPI_TABLES resource and get passed to the next kernel on
> > > > kexec, regardless if it is crash kernel or not. I'm not sure we would not
> > > > break anything.
> > > >
> > >
> > > Yeah, you're right. So this patch is necessary in any case.
> > >
> > > Do we also need the EFI side patch then?
> >
> > Yes, we need it to get it mapped into the crashkernel direct mapping.
>
> Ughh. The patch alone causes crash as EFI_ACPI_RELACLAIM_MEMORY is not
> mapped into direct mapping during memory init.
>

It would be good if this boot path could be covered by lkp@ as well -
currently, you are the only person testing this manually. The same
applies to SEV-SNP by the way - there is zero coverage except for the
manual testing that Boris or Tom Lendacky might do.

> The patch below fixes the issue.
>
> I should have noticed it before, but I had essentially the same patch in
> my tree for a different reason. Sorry for that :/
>

No worries - it shouldn't be only up to you to test this stuff. I
would have tested it myself if I had access to TDX hardware.

> Please apply the patch below.
>

OK, queued up now.



> From b5d1faf9e515195c58cc5e34132284894fca17f2 Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Mon, 14 Aug 2023 19:12:47 +0300
> Subject: [PATCH] efi/unaccepted: Make sure unaccepted table is mapped in
>  crashkernel case
>
> Unaccepted table is now allocated from EFI_ACPI_RELACLAIM_MEMORY. It
> translates into E820_TYPE_ACPI, which is not added to memblock and
> therefore not mapped in the direct mapping.
>
> It causes crash on the first touch of the table.
>
> Use memblock_add() to make sure that the table is mapped in direct
> mapping.
>
> Align the range to the nearest page boarders. Ranges smaller than page
> size are not going to be mapped.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Fixes: e7761d827e99 ("efi/unaccepted: Use ACPI reclaim memory for unaccepted memory table")
> ---
>  drivers/firmware/efi/efi.c | 32 +++++++++++++++++++++++++++++---
>  1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 1599f1176842..4f409652b3c6 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -623,6 +623,34 @@ static __init int match_config_table(const efi_guid_t *guid,
>         return 0;
>  }
>
> +/**
> + * reserve_unaccepted - Map and reserve unaccepted configuration table
> + * @unaccepted: Pointer to unaccepted memory table
> + *
> + * memblock_add() makes sure that the table is mapped in direct mapping. During
> + * normal boot it happens automatically because the table is allocated from
> + * usable memory. But during crashkernel boot only memory specifically
> + * reserved for crash scenario is mapped. memblock_add() forces the table to be
> + * mapped in crashkernel case.
> + *
> + * Align the range to the nearest page boarders. Ranges smaller than page size
> + * are not going to be mapped.
> + *
> + * memblock_reserve() makes sure that future allocations will not touch the
> + * table.
> + */
> +
> +static __init void reserve_unaccepted(struct efi_unaccepted_memory *unaccepted)
> +{
> +       phys_addr_t start, size;
> +
> +       start = PAGE_ALIGN_DOWN(efi.unaccepted);
> +       size = PAGE_ALIGN(sizeof(*unaccepted) + unaccepted->size);
> +
> +       memblock_add(start, size);
> +       memblock_reserve(start, size);
> +}
> +
>  int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
>                                    int count,
>                                    const efi_config_table_type_t *arch_tables)
> @@ -751,11 +779,9 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
>
>                 unaccepted = early_memremap(efi.unaccepted, sizeof(*unaccepted));
>                 if (unaccepted) {
> -                       unsigned long size;
>
>                         if (unaccepted->version == 1) {
> -                               size = sizeof(*unaccepted) + unaccepted->size;
> -                               memblock_reserve(efi.unaccepted, size);
> +                               reserve_unaccepted(unaccepted);
>                         } else {
>                                 efi.unaccepted = EFI_INVALID_TABLE_ADDR;
>                         }
> --
>   Kiryl Shutsemau / Kirill A. Shutemov
Kirill A. Shutemov Sept. 18, 2023, 12:12 p.m. UTC | #7
On Mon, Sep 18, 2023 at 09:01:27AM +0200, Ard Biesheuvel wrote:
> On Sun, 17 Sept 2023 at 19:06, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > On Thu, Sep 07, 2023 at 01:49:14PM +0300, Kirill A. Shutemov wrote:
> > > On Wed, Sep 06, 2023 at 11:17:12AM +0200, Ard Biesheuvel wrote:
> > > > On Fri, 18 Aug 2023 at 17:16, Kirill A. Shutemov
> > > > <kirill.shutemov@linux.intel.com> wrote:
> > > > >
> > > > > On Thu, Aug 17, 2023 at 10:25:56PM +0200, Ard Biesheuvel wrote:
> > > > > > On Wed, 16 Aug 2023 at 23:24, Kirill A. Shutemov
> > > > > > <kirill.shutemov@linux.intel.com> wrote:
> > > > > > >
> > > > > > > e820__end_of_ram_pfn() is used to calculate max_pfn which, among other
> > > > > > > things, guides where direct mapping ends. Any memory above max_pfn is
> > > > > > > not going to be present in the direct mapping.
> > > > > > >
> > > > > > > e820__end_of_ram_pfn() finds the end of the ram based on the highest
> > > > > > > E820_TYPE_RAM range. But it doesn't includes E820_TYPE_ACPI ranges into
> > > > > > > calculation.
> > > > > > >
> > > > > > > Despite the name, E820_TYPE_ACPI covers not only ACPI data, but also EFI
> > > > > > > tables and might be required by kernel to function properly.
> > > > > > >
> > > > > > > Usually the problem is hidden because there is some E820_TYPE_RAM memory
> > > > > > > above E820_TYPE_ACPI. But crashkernel only presents pre-allocated crash
> > > > > > > memory as E820_TYPE_RAM on boot. If the preallocated range is small, it
> > > > > > > can fit under the last E820_TYPE_ACPI range.
> > > > > > >
> > > > > > > Modify e820__end_of_ram_pfn() and e820__end_of_low_ram_pfn() to cover
> > > > > > > E820_TYPE_ACPI memory.
> > > > > > >
> > > > > > > The problem was discovered during debugging kexec for TDX guest. TDX
> > > > > > > guest uses E820_TYPE_ACPI to store the unaccepted memory bitmap and pass
> > > > > > > it between the kernels on kexec.
> > > > > > >
> > > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > > >
> > > > > > No objections to this, but we might also simply drop E820_TYPE_ACPI
> > > > > > altogether: it is only used for EFI_ACPI_RECLAIM_MEMORY, which is
> > > > > > memory that can be used by the OS as ordinary RAM if it is not
> > > > > > interested in the contents (or has already consumed them). So this
> > > > > > could arguably be classified as E820_TYPE_RAM too.
> > > > >
> > > > > Hm. I'm not sure about this. E820_TYPE_ACPI also get tracked as
> > > > > IORES_DESC_ACPI_TABLES resource and get passed to the next kernel on
> > > > > kexec, regardless if it is crash kernel or not. I'm not sure we would not
> > > > > break anything.
> > > > >
> > > >
> > > > Yeah, you're right. So this patch is necessary in any case.
> > > >
> > > > Do we also need the EFI side patch then?
> > >
> > > Yes, we need it to get it mapped into the crashkernel direct mapping.
> >
> > Ughh. The patch alone causes crash as EFI_ACPI_RELACLAIM_MEMORY is not
> > mapped into direct mapping during memory init.
> >
> 
> It would be good if this boot path could be covered by lkp@ as well -
> currently, you are the only person testing this manually. The same
> applies to SEV-SNP by the way - there is zero coverage except for the
> manual testing that Boris or Tom Lendacky might do.

It made me remember that I forgot attribute the finding:

Reported-by: "Hongyu Ning" <hongyu.ning@intel.com>

Hongyu tests linux-next periodically, but I agree we need to get LKP
working.

I will try to get LKP going for TDX guest.
Ard Biesheuvel Sept. 18, 2023, 12:32 p.m. UTC | #8
On Mon, 18 Sept 2023 at 14:13, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Mon, Sep 18, 2023 at 09:01:27AM +0200, Ard Biesheuvel wrote:
> > On Sun, 17 Sept 2023 at 19:06, Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > On Thu, Sep 07, 2023 at 01:49:14PM +0300, Kirill A. Shutemov wrote:
> > > > On Wed, Sep 06, 2023 at 11:17:12AM +0200, Ard Biesheuvel wrote:
> > > > > On Fri, 18 Aug 2023 at 17:16, Kirill A. Shutemov
> > > > > <kirill.shutemov@linux.intel.com> wrote:
> > > > > >
> > > > > > On Thu, Aug 17, 2023 at 10:25:56PM +0200, Ard Biesheuvel wrote:
> > > > > > > On Wed, 16 Aug 2023 at 23:24, Kirill A. Shutemov
> > > > > > > <kirill.shutemov@linux.intel.com> wrote:
> > > > > > > >
> > > > > > > > e820__end_of_ram_pfn() is used to calculate max_pfn which, among other
> > > > > > > > things, guides where direct mapping ends. Any memory above max_pfn is
> > > > > > > > not going to be present in the direct mapping.
> > > > > > > >
> > > > > > > > e820__end_of_ram_pfn() finds the end of the ram based on the highest
> > > > > > > > E820_TYPE_RAM range. But it doesn't includes E820_TYPE_ACPI ranges into
> > > > > > > > calculation.
> > > > > > > >
> > > > > > > > Despite the name, E820_TYPE_ACPI covers not only ACPI data, but also EFI
> > > > > > > > tables and might be required by kernel to function properly.
> > > > > > > >
> > > > > > > > Usually the problem is hidden because there is some E820_TYPE_RAM memory
> > > > > > > > above E820_TYPE_ACPI. But crashkernel only presents pre-allocated crash
> > > > > > > > memory as E820_TYPE_RAM on boot. If the preallocated range is small, it
> > > > > > > > can fit under the last E820_TYPE_ACPI range.
> > > > > > > >
> > > > > > > > Modify e820__end_of_ram_pfn() and e820__end_of_low_ram_pfn() to cover
> > > > > > > > E820_TYPE_ACPI memory.
> > > > > > > >
> > > > > > > > The problem was discovered during debugging kexec for TDX guest. TDX
> > > > > > > > guest uses E820_TYPE_ACPI to store the unaccepted memory bitmap and pass
> > > > > > > > it between the kernels on kexec.
> > > > > > > >
> > > > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > > > >
> > > > > > > No objections to this, but we might also simply drop E820_TYPE_ACPI
> > > > > > > altogether: it is only used for EFI_ACPI_RECLAIM_MEMORY, which is
> > > > > > > memory that can be used by the OS as ordinary RAM if it is not
> > > > > > > interested in the contents (or has already consumed them). So this
> > > > > > > could arguably be classified as E820_TYPE_RAM too.
> > > > > >
> > > > > > Hm. I'm not sure about this. E820_TYPE_ACPI also get tracked as
> > > > > > IORES_DESC_ACPI_TABLES resource and get passed to the next kernel on
> > > > > > kexec, regardless if it is crash kernel or not. I'm not sure we would not
> > > > > > break anything.
> > > > > >
> > > > >
> > > > > Yeah, you're right. So this patch is necessary in any case.
> > > > >
> > > > > Do we also need the EFI side patch then?
> > > >
> > > > Yes, we need it to get it mapped into the crashkernel direct mapping.
> > >
> > > Ughh. The patch alone causes crash as EFI_ACPI_RELACLAIM_MEMORY is not
> > > mapped into direct mapping during memory init.
> > >
> >
> > It would be good if this boot path could be covered by lkp@ as well -
> > currently, you are the only person testing this manually. The same
> > applies to SEV-SNP by the way - there is zero coverage except for the
> > manual testing that Boris or Tom Lendacky might do.
>
> It made me remember that I forgot attribute the finding:
>
> Reported-by: "Hongyu Ning" <hongyu.ning@intel.com>
>

OK, I will add that next time I update the branch.

> Hongyu tests linux-next periodically, but I agree we need to get LKP
> working.
>
> I will try to get LKP going for TDX guest.
>

Yes, please. We are likely to run into such issues again if we don't
have test coverage for this functionality.

Thanks,
Ard.
diff mbox series

Patch

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index fb8cf953380d..99c80680dc9e 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -827,7 +827,7 @@  u64 __init e820__memblock_alloc_reserved(u64 size, u64 align)
 /*
  * Find the highest page frame number we have available
  */
-static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type type)
+static unsigned long __init e820_end_ram_pfn(unsigned long limit_pfn)
 {
 	int i;
 	unsigned long last_pfn = 0;
@@ -838,7 +838,8 @@  static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type
 		unsigned long start_pfn;
 		unsigned long end_pfn;
 
-		if (entry->type != type)
+		if (entry->type != E820_TYPE_RAM &&
+		    entry->type != E820_TYPE_ACPI)
 			continue;
 
 		start_pfn = entry->addr >> PAGE_SHIFT;
@@ -864,12 +865,12 @@  static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type
 
 unsigned long __init e820__end_of_ram_pfn(void)
 {
-	return e820_end_pfn(MAX_ARCH_PFN, E820_TYPE_RAM);
+	return e820_end_ram_pfn(MAX_ARCH_PFN);
 }
 
 unsigned long __init e820__end_of_low_ram_pfn(void)
 {
-	return e820_end_pfn(1UL << (32 - PAGE_SHIFT), E820_TYPE_RAM);
+	return e820_end_ram_pfn(1UL << (32 - PAGE_SHIFT));
 }
 
 static void __init early_panic(char *msg)